Longest Common Subsequence (LCS) — Complete Guide

This will be a complete deep understanding guide so you can use LCS intuition for many DP problems.

Subsequence vs Substring

  • Subsequence: Characters in order, but not necessarily contiguous.
  • Substring: Characters must be continuous.

#Subsequence

String = "ABCDE"

Valid subsequences:
A
ACE
BDE
ABCDE

#Substring
ABC  ✔
BCD  ✔
ACE  ❌
Python

Problem Definition

Given two strings. Find the length of the longest subsequence common to both strings.

Brute Force Thinking (Important for intuition)

At each character we have two choices

For strings

A B C D
A C B D


We can:
take a char
skip a char

Brute force explores all subsequences.

Total subsequences of length n = 2^n

So brute force complexity becomes: O(2^(n+m))  #Too slow.
n length of string 1
m length of string 2
Python

Key Insight

Compare characters: text1[i] and text2[j]

Case 1: Characters match

  • text1[i] == text2[j]
  • Then the character must be part of LCS.
  • So: LCS(i, j) = 1 + LCS(i+1, j+1)

Case 2: Characters don’t match

  • We skip one character.
  • Two possibilities:
    • skip text1[i]
    • skip text2[j]
  • So: LCS(i, j) = max(LCS(i+1, j), LCS(i, j+1))
                  LCS(i,j)
                 /       \
       (match) /          \ (mismatch)
              /            \
1 + LCS(i+1,j+1)        max(LCS(i+1,j),LCS(i,j+1))
Python

Recursive Code


#Variation 1
    
def lcs(i, j):
    if i == len(text1) or j == len(text2):
        return 0

    if text1[i] == text2[j]:
        return 1 + lcs(i+1, j+1)

    return max(
        lcs(i+1, j),
        lcs(i, j+1)
    )
    
    
    
#Variation 2    
    
def lcs(i, j):

    if i < 0 or j < 0:
        return 0

    if text1[i] == text2[j]:
        return 1 + lcs(i-1, j-1)

    return max(
        lcs(i-1, j),
        lcs(i, j-1)
    )
    
Python

Memoization (Top-Down DP)

The Repeating Subproblem Appears

         LCS(0,0)
       /         \
 LCS(1,0)      LCS(0,1)
     |             |
 LCS(2,1)      LCS(1,2)
     \           /
      \         /
       LCS(2,2)
               
      
      
LCS(2,2) is computed multiple times.               
Python

Code

def lcs(i, j):

    if i == len(text1) or j == len(text2):
        return 0

    if (i, j) in memo:
        return memo[(i, j)]

    if text1[i] == text2[j]:
        ans = 1 + lcs(i + 1, j + 1)
    else:
        ans = max(
            lcs(i + 1, j),
            lcs(i, j + 1)
        )

    memo[(i, j)] = ans
    return ans


text1 = "abcde"
text2 = "ace"
memo = {}

print(lcs(0, 0))
Python

  • Time : O(nm)
  • Space : O(nm)

Bottom-Up Code

Example

text1 = ABCD
text2 = ACBD

DP table

Initialize

      A C B D
   0 0 0 0 0
A  0
B  0
C  0
D  0


Final table

      A C B D
   0 0 0 0 0
A  0 1 1 1 1
B  0 1 1 2 2
C  0 1 2 2 2
D  0 1 2 2 3
Python

Code

def longestCommonSubsequence(text1, text2):

    n = len(text1)
    m = len(text2)

    dp = [[0]*(m+1) for _ in range(n+1)]

    for i in range(1, n+1):
        for j in range(1, m+1):

            if text1[i-1] == text2[j-1]:
                dp[i][j] = 1 + dp[i-1][j-1]

            else:
                dp[i][j] = max(
                    dp[i-1][j],
                    dp[i][j-1]
                )

    return dp[n][m]
Python

Reconstruct the LCS String

It’s so simple , just save the characters

if text1[i-1] == text2[j-1]:
    dp[i][j] = 1 + dp[i-1][j-1]
    #save charaters here
Python

Noop

we cannot use the same for loops for reconstruction is because reconstruction does not follow a fixed order like the DP filling phase.

Reconstruction is Different

  • Reconstruction does not visit every cell.
  • Instead, it follows one path from: (n, m) → (0, 0)
  # Reconstruct LCS
    i = n
    j = m
    lcs = []

    while i > 0 and j > 0:

        if text1[i-1] == text2[j-1]:
            lcs.append(text1[i-1])
            i -= 1
            j -= 1

        elif dp[i-1][j] > dp[i][j-1]:
            i -= 1
        else:
            j -= 1

    return ''.join(reversed(lcs))
Python

  • We move in reverse direction, then reversed the end string
  • dp[i-1][j] > dp[i][j-1] Meaning the best LCS came from: dp[i-1][j] So we move up. i -= 1
  • dp[i][j-1] >= dp[i-1][j] Meaning the best LCS came from: dp[i][j-1] So we move left. j -= 1

Longest Common Substring

s1 = "abcdgh"
s2 = "acdghr"

      a c d g h r
    0 0 0 0 0 0 0
a   0 1 0 0 0 0 0
b   0 0 0 0 0 0 0
c   0 0 1 0 0 0 0
d   0 0 0 2 0 0 0
g   0 0 0 0 3 0 0
h   0 0 0 0 0 4 0
Python

def longestCommonSubstring(s1, s2):

    n = len(s1)
    m = len(s2)

    dp = [[0]*(m+1) for _ in range(n+1)]

    longest = 0

    for i in range(1, n+1):
        for j in range(1, m+1):

            if s1[i-1] == s2[j-1]:
                dp[i][j] = 1 + dp[i-1][j-1]

                longest = max(longest, dp[i][j])

            else:
                dp[i][j] = 0

    return longest
Python

  • Time : O(n * m)
  • Space : O(n * m)

Space Optimized Version

We only need previous row.

def longestCommonSubstring(s1, s2):

    n = len(s1)
    m = len(s2)

    prev = [0]*(m+1)

    longest = 0

    for i in range(1, n+1):

        curr = [0]*(m+1)

        for j in range(1, m+1):

            if s1[i-1] == s2[j-1]:

                curr[j] = 1 + prev[j-1]

                longest = max(longest, curr[j])

        prev = curr

    return longest
Python

  • Space O(m)

Track the ending index.

def longestCommonSubstring(s1, s2):

    n = len(s1)
    m = len(s2)

    dp = [[0]*(m+1) for _ in range(n+1)]

    longest = 0
    end_index = 0

    for i in range(1, n+1):
        for j in range(1, m+1):

            if s1[i-1] == s2[j-1]:

                dp[i][j] = 1 + dp[i-1][j-1]

                if dp[i][j] > longest:
                    longest = dp[i][j]
                    end_index = i

    return s1[end_index-longest:end_index]
Python

Leave a Comment