Given a text, find all occurrences of a given pattern in it.

For example,

`1. For text T = "ABCABAABCABAC" and pattern P = "ABAA", `

`The pattern occurs only once in the text, starting from index 3 ("ABC ABAABCABAC")`

`2. For text T = "ABCABAABCABAC" and pattern P = "CAB",`

`The pattern occurs twice in the text, starting from index 2 and 8. ("AB CABAABCABAC" and "ABCABAABCABAC")`

*The pattern matching is very common real life problem that arises frequently in text-editing programs such as MS word, notepad, notepad++, etc. String pattern matching algorithms are also used to search for particular patterns in DNA sequences.*

The goal is to find all occurrences of the pattern P[1..m] of length m in given text T[1..n] of length n. In this post, we will discuss the naive string-matching algorithm.

The naive algorithm finds all occurrences of pattern by using a loop that checks the condition

P[1..m] = T[s+1..s+m] for each of the n-m+1 possible values of s. Below is the psedo-code for the algorithm.

**Naive-Pattern-Matching(T, P)**

n = length[T]

m = length[P]

**for** s = 0 to n – m **do**

**if** P[1 .. m] = T[s+1 .. s+m] **do**

**print** `"Pattern found at position s"`

**end-if**

**end-for**

## C

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
#include <stdio.h> #include <string.h> // Function to find all occurrences of a pattern of length m // in given text of length n void find(const char* text, const char* pattern, int n, int m) { for (int i = 0; i <= n - m; i++) { for (int j = 0; j < m; j++) { if (text[i + j] != pattern[j]) break; if (j == m - 1) printf("Pattern occurs with shift %d\n", i); } } } // Program to demonstrate Naive Pattern Matching Algorithm in C int main(void) { char* text = "ABCABAABCABAC"; char* pattern = "CAB"; int n = strlen(text); int m = strlen(pattern); find(text, pattern, n, m); return 0; } |

`Output: `

Pattern occurs with shift 2

Pattern occurs with shift 8

## C++

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
#include <iostream> using namespace std; // Function to find all occurrences of a pattern of length m // in given text of length n void find(string text, string pattern) { int n = text.length(); int m = pattern.length(); for (int i = 0; i <= n - m; i++) { for (int j = 0; j < m; j++) { if (text[i + j] != pattern[j]) break; if (j == m - 1) cout << "Pattern occurs with shift " << i << endl; } } } // Program to demonstrate Naive Pattern Matching Algorithm in C++ int main() { string text = "ABCABAABCABAC"; string pattern = "CAB"; find(text, pattern); return 0; } |

`Output: `

Pattern occurs with shift 2

Pattern occurs with shift 8

## Java

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
class PatternMatching { // Function to find all occurrences of a pattern of length m // in given text of length n public static void find(String text, String pattern) { int n = text.length(); int m = pattern.length(); int i = 0; while (i <= n - m) { for (int j = 0; j < m; j++) { if (text.charAt(i + j) != pattern.charAt(j)) break; if (j == m - 1) System.out.println("Pattern occurs with shift " + i); } i++; } } // Program to demonstrate Naive Pattern Matching Algorithm in Java public static void main(String[] args) { String text = "ABCABAABCABAC"; String pattern = "CAB"; find(text, pattern); } } |

`Output: `

Pattern occurs with shift 2

Pattern occurs with shift 8

### Performance:

The time complexity of above solution is O(nm) where n is the length of the text and m is the length of the pattern. When the length of the pattern and text are roughly same, the algorithm runs in the quadratic time.

One worst case is that text T has n number of A’s and the pattern P has (m-1) number of A’s followed by a single B. It will result in maximum comparisons. i.e.

T = AAAAAAAAAA, P = AAAB

The best case occurs when first character of the pattern P is not present in text. It will result in minimum comparisons. i.e.

T = ABCDEFGHIJKLMNO, P = ZOOM

There are many efficient string searching algorithms that performs way better than naive algorithm discussed above. Below are some of the famous algorithms –

- Knuth–Morris–Pratt algorithm

The KMP pattern matching algorithm has worst case time complexity of O(n + m).

- Rabin-Karp algorithm

The worst-case running time of the Rabin-Karp algorithm is O((n – m + 1) * m), but it has a good average-case running time. A practical application of this algorithm is detecting plagiarism. Given source material, the algorithm can rapidly search through a paper for instances of sentences from the source material.

- The Boyer-Moore algorithm

If the pattern P is relatively long and the text is reasonably large, then this algorithm is likely to be the most efficient string-matching algorithm. It has worst-case running time of O(n + m) only if the pattern does not appear in the text else it runs in O(nm) time in the worst case.

**References:** http://staff.ustc.edu.cn/~csli/graduate/algorithms/book6/chap34.htm

**Thanks for reading.**

Please use our online compiler to post code in comments.

Like us? Please spread the word and help us grow. Happy coding 🙂

## Leave a Reply