Longest Common Subsequence (LCS) — Complete Guide

This will be a complete deep understanding guide so you can use LCS intuition for many DP problems.

Subsequence vs Substring

Subsequence: Characters in order, but not necessarily contiguous.
Substring: Characters must be continuous.

#Subsequence

String = "ABCDE"

Valid subsequences:
A
ACE
BDE
ABCDE

#Substring
ABC  ✔
BCD  ✔
ACE  ❌


#Subsequence

String = "ABCDE"

Valid subsequences:
A
ACE
BDE
ABCDE

#Substring
ABC  ✔
BCD  ✔
ACE  ❌

Python

Problem Definition

Given two strings. Find the length of the longest subsequence common to both strings.

Brute Force Thinking (Important for intuition)

At each character we have two choices

For strings

A B C D
A C B D


We can:
take a char
skip a char

Brute force explores all subsequences.

Total subsequences of length n = 2^n

So brute force complexity becomes: O(2^(n+m))  #Too slow.
n length of string 1
m length of string 2

For strings

A B C D
A C B D


We can:
take a char
skip a char

Brute force explores all subsequences.

Total subsequences of length n = 2^n

So brute force complexity becomes: O(2^(n+m))  #Too slow.
n length of string 1
m length of string 2

Python

Key Insight

Compare characters: text1[i] and text2[j]

Case 1: Characters match

text1[i] == text2[j]
Then the character must be part of LCS.
So: LCS(i, j) = 1 + LCS(i+1, j+1)

Case 2: Characters don’t match

We skip one character.
Two possibilities:
- skip text1[i]
- skip text2[j]
So: LCS(i, j) = max(LCS(i+1, j), LCS(i, j+1))

                  LCS(i,j)
                 /       \
       (match) /          \ (mismatch)
              /            \
1 + LCS(i+1,j+1)        max(LCS(i+1,j),LCS(i,j+1))

                  LCS(i,j)
                 /       \
       (match) /          \ (mismatch)
              /            \
1 + LCS(i+1,j+1)        max(LCS(i+1,j),LCS(i,j+1))

Python

Recursive Code

#Variation 1
    
def lcs(i, j):
    if i == len(text1) or j == len(text2):
        return 0

    if text1[i] == text2[j]:
        return 1 + lcs(i+1, j+1)

    return max(
        lcs(i+1, j),
        lcs(i, j+1)
    )
    
    
    
#Variation 2    
    
def lcs(i, j):

    if i < 0 or j < 0:
        return 0

    if text1[i] == text2[j]:
        return 1 + lcs(i-1, j-1)

    return max(
        lcs(i-1, j),
        lcs(i, j-1)
    )


#Variation 1
    
def lcs(i, j):
    if i == len(text1) or j == len(text2):
        return 0

    if text1[i] == text2[j]:
        return 1 + lcs(i+1, j+1)

    return max(
        lcs(i+1, j),
        lcs(i, j+1)
    )
    
    
    
#Variation 2    
    
def lcs(i, j):

    if i < 0 or j < 0:
        return 0

    if text1[i] == text2[j]:
        return 1 + lcs(i-1, j-1)

    return max(
        lcs(i-1, j),
        lcs(i, j-1)
    )

Python

Memoization (Top-Down DP)

The Repeating Subproblem Appears

         LCS(0,0)
       /         \
 LCS(1,0)      LCS(0,1)
     |             |
 LCS(2,1)      LCS(1,2)
     \           /
      \         /
       LCS(2,2)
               
      
      
LCS(2,2) is computed multiple times.

         LCS(0,0)
       /         \
 LCS(1,0)      LCS(0,1)
     |             |
 LCS(2,1)      LCS(1,2)
     \           /
      \         /
       LCS(2,2)
               
      
      
LCS(2,2) is computed multiple times.

Python

Code

def lcs(i, j):

    if i == len(text1) or j == len(text2):
        return 0

    if (i, j) in memo:
        return memo[(i, j)]

    if text1[i] == text2[j]:
        ans = 1 + lcs(i + 1, j + 1)
    else:
        ans = max(
            lcs(i + 1, j),
            lcs(i, j + 1)
        )

    memo[(i, j)] = ans
    return ans


text1 = "abcde"
text2 = "ace"
memo = {}

print(lcs(0, 0))

def lcs(i, j):

    if i == len(text1) or j == len(text2):
        return 0

    if (i, j) in memo:
        return memo[(i, j)]

    if text1[i] == text2[j]:
        ans = 1 + lcs(i + 1, j + 1)
    else:
        ans = max(
            lcs(i + 1, j),
            lcs(i, j + 1)
        )

    memo[(i, j)] = ans
    return ans


text1 = "abcde"
text2 = "ace"
memo = {}

print(lcs(0, 0))

Python

Time : O(nm)
Space : O(nm)

Bottom-Up Code

Example

text1 = ABCD
text2 = ACBD

DP table

Initialize

      A C B D
   0 0 0 0 0
A  0
B  0
C  0
D  0


Final table

      A C B D
   0 0 0 0 0
A  0 1 1 1 1
B  0 1 1 2 2
C  0 1 2 2 2
D  0 1 2 2 3

text1 = ABCD
text2 = ACBD

DP table

Initialize

      A C B D
   0 0 0 0 0
A  0
B  0
C  0
D  0


Final table

      A C B D
   0 0 0 0 0
A  0 1 1 1 1
B  0 1 1 2 2
C  0 1 2 2 2
D  0 1 2 2 3

Python

Code

def longestCommonSubsequence(text1, text2):

    n = len(text1)
    m = len(text2)

    dp = [[0]*(m+1) for _ in range(n+1)]

    for i in range(1, n+1):
        for j in range(1, m+1):

            if text1[i-1] == text2[j-1]:
                dp[i][j] = 1 + dp[i-1][j-1]

            else:
                dp[i][j] = max(
                    dp[i-1][j],
                    dp[i][j-1]
                )

    return dp[n][m]

def longestCommonSubsequence(text1, text2):

    n = len(text1)
    m = len(text2)

    dp = [[0]*(m+1) for _ in range(n+1)]

    for i in range(1, n+1):
        for j in range(1, m+1):

            if text1[i-1] == text2[j-1]:
                dp[i][j] = 1 + dp[i-1][j-1]

            else:
                dp[i][j] = max(
                    dp[i-1][j],
                    dp[i][j-1]
                )

    return dp[n][m]

Python

Reconstruct the LCS String

It’s so simple , just save the characters

if text1[i-1] == text2[j-1]:
    dp[i][j] = 1 + dp[i-1][j-1]
    #save charaters here

if text1[i-1] == text2[j-1]:
    dp[i][j] = 1 + dp[i-1][j-1]
    #save charaters here

Python

Noop

we cannot use the same for loops for reconstruction is because reconstruction does not follow a fixed order like the DP filling phase.

Reconstruction is Different

Reconstruction does not visit every cell.
Instead, it follows one path from: (n, m) → (0, 0)

  # Reconstruct LCS
    i = n
    j = m
    lcs = []

    while i > 0 and j > 0:

        if text1[i-1] == text2[j-1]:
            lcs.append(text1[i-1])
            i -= 1
            j -= 1

        elif dp[i-1][j] > dp[i][j-1]:
            i -= 1
        else:
            j -= 1

    return ''.join(reversed(lcs))

  # Reconstruct LCS
    i = n
    j = m
    lcs = []

    while i > 0 and j > 0:

        if text1[i-1] == text2[j-1]:
            lcs.append(text1[i-1])
            i -= 1
            j -= 1

        elif dp[i-1][j] > dp[i][j-1]:
            i -= 1
        else:
            j -= 1

    return ''.join(reversed(lcs))

Python

We move in reverse direction, then reversed the end string
dp[i-1][j] > dp[i][j-1] Meaning the best LCS came from: dp[i-1][j] So we move up. i -= 1
dp[i][j-1] >= dp[i-1][j] Meaning the best LCS came from: dp[i][j-1] So we move left. j -= 1

Longest Common Substring

s1 = "abcdgh"
s2 = "acdghr"

      a c d g h r
    0 0 0 0 0 0 0
a   0 1 0 0 0 0 0
b   0 0 0 0 0 0 0
c   0 0 1 0 0 0 0
d   0 0 0 2 0 0 0
g   0 0 0 0 3 0 0
h   0 0 0 0 0 4 0

s1 = "abcdgh"
s2 = "acdghr"

      a c d g h r
    0 0 0 0 0 0 0
a   0 1 0 0 0 0 0
b   0 0 0 0 0 0 0
c   0 0 1 0 0 0 0
d   0 0 0 2 0 0 0
g   0 0 0 0 3 0 0
h   0 0 0 0 0 4 0

Python

def longestCommonSubstring(s1, s2):

    n = len(s1)
    m = len(s2)

    dp = [[0]*(m+1) for _ in range(n+1)]

    longest = 0

    for i in range(1, n+1):
        for j in range(1, m+1):

            if s1[i-1] == s2[j-1]:
                dp[i][j] = 1 + dp[i-1][j-1]

                longest = max(longest, dp[i][j])

            else:
                dp[i][j] = 0

    return longest

def longestCommonSubstring(s1, s2):

    n = len(s1)
    m = len(s2)

    dp = [[0]*(m+1) for _ in range(n+1)]

    longest = 0

    for i in range(1, n+1):
        for j in range(1, m+1):

            if s1[i-1] == s2[j-1]:
                dp[i][j] = 1 + dp[i-1][j-1]

                longest = max(longest, dp[i][j])

            else:
                dp[i][j] = 0

    return longest

Python

Time : O(n * m)
Space : O(n * m)

Space Optimized Version

We only need previous row.

def longestCommonSubstring(s1, s2):

    n = len(s1)
    m = len(s2)

    prev = [0]*(m+1)

    longest = 0

    for i in range(1, n+1):

        curr = [0]*(m+1)

        for j in range(1, m+1):

            if s1[i-1] == s2[j-1]:

                curr[j] = 1 + prev[j-1]

                longest = max(longest, curr[j])

        prev = curr

    return longest

def longestCommonSubstring(s1, s2):

    n = len(s1)
    m = len(s2)

    prev = [0]*(m+1)

    longest = 0

    for i in range(1, n+1):

        curr = [0]*(m+1)

        for j in range(1, m+1):

            if s1[i-1] == s2[j-1]:

                curr[j] = 1 + prev[j-1]

                longest = max(longest, curr[j])

        prev = curr

    return longest

Python

Space O(m)

Print the Substring

Track the ending index.

def longestCommonSubstring(s1, s2):

    n = len(s1)
    m = len(s2)

    dp = [[0]*(m+1) for _ in range(n+1)]

    longest = 0
    end_index = 0

    for i in range(1, n+1):
        for j in range(1, m+1):

            if s1[i-1] == s2[j-1]:

                dp[i][j] = 1 + dp[i-1][j-1]

                if dp[i][j] > longest:
                    longest = dp[i][j]
                    end_index = i

    return s1[end_index-longest:end_index]

def longestCommonSubstring(s1, s2):

    n = len(s1)
    m = len(s2)

    dp = [[0]*(m+1) for _ in range(n+1)]

    longest = 0
    end_index = 0

    for i in range(1, n+1):
        for j in range(1, m+1):

            if s1[i-1] == s2[j-1]:

                dp[i][j] = 1 + dp[i-1][j-1]

                if dp[i][j] > longest:
                    longest = dp[i][j]
                    end_index = i

    return s1[end_index-longest:end_index]

Python

Longest Common Subsequence (LCS) — Complete Guide

Table of Contents

Subsequence vs Substring

Problem Definition

Brute Force Thinking (Important for intuition)

Key Insight

Recursive Code

Memoization (Top-Down DP)

Bottom-Up Code

Reconstruct the LCS String

Longest Common Substring

Print the Substring

About Puneet Verma

Leave a Comment Cancel reply