Introduction to Pattern Matching

Given a text, find all occurrences of a given pattern in it.

 
For example,

1. For text T = "ABCABAABCABAC" and pattern P = "ABAA",

The pattern occurs only once in the text, starting from index 3 ("ABCABAABCABAC")

 
2. For text T = "ABCABAABCABAC" and pattern P = "CAB",

The pattern occurs twice in the text, starting from index 2 and 8. ("ABCABAABCABAC" and "ABCABAABCABAC")

 
The pattern matching is very common real life problem that arises frequently in text-editing programs such as MS word, notepad, notepad++, etc. String pattern matching algorithms are also used to search for particular patterns in DNA sequences.

The goal is to find all occurrences of the pattern P[1..m] of length m in given text T[1..n] of length n. In this post, we will discuss the naive string-matching algorithm.

 
The naive algorithm finds all occurrences of pattern by using a loop that checks the condition
P[1..m] = T[s+1..s+m] for each of the n-m+1 possible values of s. Below is the psedo-code for the algorithm.

 
Naive-Pattern-Matching(T, P)

n = length[T]
m = length[P]

for s = 0 to n – m do
    if P[1 .. m] = T[s+1 .. s+m] do
        print "Pattern found at position s"
    end-if
end-for

 

C

Download   Run Code

Output:

Pattern occurs with shift 2
Pattern occurs with shift 8

C++

Download   Run Code

Output:

Pattern occurs with shift 2
Pattern occurs with shift 8

Java

Download   Run Code

Output:

Pattern occurs with shift 2
Pattern occurs with shift 8

 

Performance:

 
The time complexity of above solution is O(nm) where n is the length of the text and m is the length of the pattern. When the length of the pattern and text are roughly same, the algorithm runs in the quadratic time.

One worst case is that text T has n number of A’s and the pattern P has (m-1) number of A’s followed by a single B. It will result in maximum comparisons. i.e.

T = AAAAAAAAAA, P = AAAB

The best case occurs when first character of the pattern P is not present in text. It will result in minimum comparisons. i.e.

T = ABCDEFGHIJKLMNO, P = ZOOM
 

There are many efficient string searching algorithms that performs way better than naive algorithm discussed above. Below are some of the famous algorithms –

  • Knuth–Morris–Pratt algorithm
    The KMP pattern matching algorithm has worst case time complexity of O(n + m).
     
  • Rabin-Karp algorithm
    The worst-case running time of the Rabin-Karp algorithm is O((n – m + 1) * m), but it has a good average-case running time. A practical application of this algorithm is detecting plagiarism. Given source material, the algorithm can rapidly search through a paper for instances of sentences from the source material.
     
  • The Boyer-Moore algorithm
    If the pattern P is relatively long and the text is reasonably large, then this algorithm is likely to be the most efficient string-matching algorithm. It has worst-case running time of O(n + m) only if the pattern does not appear in the text else it runs in O(nm) time in the worst case.

 
References: http://staff.ustc.edu.cn/~csli/graduate/algorithms/book6/chap34.htm

 
1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 5.00 out of 5)

Loading...

Thanks for reading.

Please use our online compiler to post code in comments. To contribute, get in touch with us.
Like us? Please spread the word and help us grow. Happy coding 🙂
 



Leave a Reply

avatar
  Subscribe  
Notify of