# Introduction to Pattern Matching

Given a text, find all occurrences of a given pattern in it.

For example,

1. For text T = "ABCABAABCABAC" and pattern P = "ABAA",

The pattern occurs only once in the text, starting from index 3 ("ABCABAABCABAC")

2. For text T = "ABCABAABCABAC" and pattern P = "CAB",

The pattern occurs twice in the text, starting from index 2 and 8. ("ABCABAABCABAC" and "ABCABAABCABAC")

The pattern matching is very common real life problem that arises frequently in text-editing programs such as MS word, notepad, notepad++, etc. String pattern matching algorithms are also used to search for particular patterns in DNA sequences.

The goal is to find all occurrences of the pattern P[1..m] of length m in given text T[1..n] of length n. In this post, we will discuss the naive string-matching algorithm.

The naive algorithm finds all occurrences of pattern by using a loop that checks the condition
P[1..m] = T[s+1..s+m] for each of the n-m+1 possible values of s. Below is the psedo-code for the algorithm.

Naive-Pattern-Matching(T, P)

n = length[T]
m = length[P]

for s = 0 to n – m do
if P[1 .. m] = T[s+1 .. s+m] do
print "Pattern found at position s"
end-if
end-for

## C

Output:

Pattern occurs with shift 2
Pattern occurs with shift 8

## C++

Output:

Pattern occurs with shift 2
Pattern occurs with shift 8

## Java

Output:

Pattern occurs with shift 2
Pattern occurs with shift 8

### Performance:

The time complexity of above solution is O(nm) where n is the length of the text and m is the length of the pattern. When the length of the pattern and text are roughly same, the algorithm runs in the quadratic time.

One worst case is that text T has n number of A’s and the pattern P has (m-1) number of A’s followed by a single B. It will result in maximum comparisons. i.e.

T = AAAAAAAAAA, P = AAAB

The best case occurs when first character of the pattern P is not present in text. It will result in minimum comparisons. i.e.

T = ABCDEFGHIJKLMNO, P = ZOOM

There are many efficient string searching algorithms that performs way better than naive algorithm discussed above. Below are some of the famous algorithms –

• Knuth–Morris–Pratt algorithm
The KMP pattern matching algorithm has worst case time complexity of O(n + m).

• Rabin-Karp algorithm
The worst-case running time of the Rabin-Karp algorithm is O((n – m + 1) * m), but it has a good average-case running time. A practical application of this algorithm is detecting plagiarism. Given source material, the algorithm can rapidly search through a paper for instances of sentences from the source material.

• The Boyer-Moore algorithm
If the pattern P is relatively long and the text is reasonably large, then this algorithm is likely to be the most efficient string-matching algorithm. It has worst-case running time of O(n + m) only if the pattern does not appear in the text else it runs in O(nm) time in the worst case.