Monday 23 September 2024

Knuth Morris Pratt Algorithm & Other associated methods

Knuth Morris Pratt Algorithm & Other associated methods

I have always wanted to learn more about Data Structures and Algorithms.

However, owing to the non-CS background of my graduation, I never had the time to do so.

I thought, I would give it a try and started solving LeetCode problems on 20-September-2024.

As soon as I logged in, the first problem that was thrown at me was the Shortest Palindrome Problem.

Problem description is as follows:

You are given a string s. You can convert s to a palindrome by adding characters in front of it.

Return the shortest palindrome you can find by performing this
transformation
.

Example 1:
Input: s = “aacecaaa”
Output: “aaacecaaa”

Example 2:
Input: s = “abcd”
Output: “dcbabcd”

Constraints:

  • 0 <= s.length <= 5 * 104
  • s consists of lowercase English letters only.

I went ahead and tried solving the problem from a pseudocode POV:

  1. Check the length of s to see that the constraint is satisfied
  2. If len(s)=0, return result as s
  3. Start searching s and eleminating each element from the back with each iteration, that means check if each s[0:n] or s[0:(n-1)] or s[0:(n-2)] and so on is a palindrome.
  4. We assume, for now, that we are given a helper function isPalindrome, which compares the 0th element with the nth element and (0+1)th element with the (n-1)th element.
  5. Find the shortest possible palindrome - sp.
  6. Then reverse the remaining part of the string and add it to the start of sp.

Now, if we see the above in action, it will look pretty messy, however, we go ahead and check it anyway:

  1. Assume we have string : s=abcd.
  2. If len(s) = 0, return s
  3. Else, run around s, check, if
    1. s[0] = s[len(s)-1] = FALSE : ada \neq d
    2. s[0] = s[len(s)-2] = FALSE : aca \neq c
    3. s[0] = s[len(s)-3] = FALSE : aba \neq b
    4. s[0] = s[len(s)-4] = TRUE : a=aa = a
  4. isPalindrome[s[0]]=TRUE
  5. s' = reverse[s[1:n]] : dcbdcb
  6. concatenate [s', s] : dcbabcddcbabcd

I solved it using this, however, you will notice that there are 2 linear loops running,

  1. loop for checking through each element : O(n)O(n)
  2. isPalindrome loop : O(n)O(n)

The complexity of the above will be O(n2)O(n^2).


I was happy :) and submitted, after submission, I saw that there are multiple other algorithmic approaches that give a much better and faster solution to this solution.

This is when I came across the Knuth Morris Pratt Algorithm.


I tried to understand this from a Mathematician’s perspective before getting into the programming.

Often, the computer scientists over simplify a mathematical method and call it an algorithm, this is precisely why, a solution that would have taken an afternoon for a CS grad, is taking more than 2 days for a Mathematician.


Knuth Morris Pratt is an efficient algorithm for substring search 1.

  • Prefix Function Definition:
    You are given a string s of length n. The prefix function for this string is defined as an array π\pi of length n, where π(i)\pi(i) is the length of the longest proper prefix of the substring s[0...i]s[0...i] which is also a suffix of this substring. A proper prefix of a string is a prefix that is not equal to the string itself. By definition,   π[0]=0\pi[0]=0.

Mathematically the definition of the prefix function can be written as follows:
π[i]=maxk=0i{k:s[0k1]=s[i(k1)i]}\pi[i] = \max_ {k = 0 \dots i} \{k : s[0 \dots k-1] = s[i-(k-1) \dots i] \}

No comments:

Post a Comment