Given a text, find all occurrences of a given pattern in it.
For text T = “abcabaabcabac” and pattern P = “abaa”, the pattern occurs only once in the text, starting from index 3 (“abcabaabcabac”)
For text T = “abcabaabcabac” and pattern P = “cab”, the pattern occurs twice in the text, starting from index 2 and 8. (“abcabaabcabac” and “abcabaabcabac”)
Pattern matching is very common real life problem that arises frequently in text-editing programs such as MS word, notepad, notepad++, etc. String pattern matching algorithms are also used to search for particular patterns in DNA sequences.
The goal is to find all occurrences of the pattern P[1..m] of length m in given text T[1..n]of length n. We will discuss the naive string-matching algorithm in this post.
The naive algorithm finds all occurrences of pattern by using a loop that checks the condition P[1..m] = T[s+1..s+m] for each of the n-m+1 possible values of s. Below is the psedo-code for the algorithm.
n = length[T]
m = length[P]
for s = 0 to n – m
if P[1 .. m] = T[s+1 .. s+m]
print “Pattern found at position s”
C++ implementation –
using namespace std;
void find(string text, string pattern, int n, int m)
int i = 0;
while (i <= n - m)
for (int j = 0; j < m; j++)
if (text[i + j] != pattern[j])
if (j == m - 1)
cout << "Pattern occurs with shift " << i << endl;
// main function
string text = "abcabaabcabac";
string pattern = "cab";
int n = text.length();
int m = pattern.length();
find(text, pattern, n, m);
Pattern occurs with shift 2
Pattern occurs with shift 8
The time complexity of above solution is O(nm). When the length of the pattern and text are roughly equal, the algorithm runs in the quadratic time.
One worst case is that text T has n number of A’s and the pattern P has (m-1) number of A’s followed by a single B. It will result in maximum comparisons. i.e.
T = AAAAAAAAAA, P = AAAB
The best case occurs when first character of the pattern P is not present in text. It will result in minimum comparisons. i.e.
T = ABCDEFGHIJKLMNO, P = ZOOM
There are many efficient string searching algorithms that performs way better than naive algorithm discussed above. Below are some of the famous algorithms –
- Knuth–Morris–Pratt algorithm
The KMP pattern matching algorithm has worst case time complexity of O(n + m).
- Rabin-Karp algorithm
The worst-case running time of the Rabin-Karp algorithm is O((n – m + 1)m), but it has a good average-case running time. A practical application of this algorithm is detecting plagiarism. Given source material, the algorithm can rapidly search through a paper for instances of sentences from the source material.
- The Boyer-Moore algorithm
If the pattern P is relatively long and the text is reasonably large, then this algorithm is likely to be the most efficient string-matching algorithm. It has worst-case running time of O(n + m) only if the pattern does not appear in the text else it runs in O(nm) time in the worst case.
Thanks for reading.