Finding the similarity between two pieces of text is a foundational challenge in computer science. Instead of just counting how many characters we change, this problem asks us to consider the specific "cost" of each character based on its ASCII value.
Problem Summary
You're given:
- Two strings,
s1ands2.
Your goal:
- Find the lowest total ASCII sum of deleted characters required to make the two strings identical.
Intuition
This problem is a variation of the Longest Common Subsequence (LCS) or Edit Distance problems. The core idea is to decide, for every character pair, whether to keep them or delete them to reach a state where both strings are equal.
We use Dynamic Programming (DP) with memoization to explore these choices:
- The Base Case: If we reach the end of one string, the only way to make the strings equal is to delete all remaining characters in the other string. We sum up their ASCII values and return that as the cost.
-
The Match: If the characters
s1[i]ands2[j]are the same, it costs us 0 to keep them. We simply move to the next pair of characters. - The Mismatch: If they don't match, we have two primary choices:
- Delete
s1[i]and pay its ASCII price, then compare the rest ofs1with the currents2. Delete
s2[j]and pay its ASCII price, then compare the currents1with the rest ofs2.The Optimization: We always choose the path that yields the minimum sum. By storing the results in a 2D array (
dp), we avoid calculating the same sub-problem multiple times.
Code Blocks
C++ Solution
class Solution {
public:
int helper(string& s1, string& s2, int i, int j, vector<vector<int>>& dp) {
int n = s1.size();
int m = s2.size();
// If s1 is exhausted, delete remaining characters of s2
if (i == n) {
int sum = 0;
for (int k = j; k < m; k++) sum += s2[k];
return sum;
}
// If s2 is exhausted, delete remaining characters of s1
if (j == m) {
int sum = 0;
for (int k = i; k < n; k++) sum += s1[k];
return sum;
}
if (dp[i][j] != -1) return dp[i][j];
if (s1[i] == s2[j]) {
// Characters match, no cost added
return dp[i][j] = helper(s1, s2, i + 1, j + 1, dp);
} else {
// Choice 1: Delete s1[i]
int deleteS1 = s1[i] + helper(s1, s2, i + 1, j, dp);
// Choice 2: Delete s2[j]
int deleteS2 = s2[j] + helper(s1, s2, i, j + 1, dp);
return dp[i][j] = min(deleteS1, deleteS2);
}
}
int minimumDeleteSum(string s1, string s2) {
int n = s1.size();
int m = s2.size();
vector<vector<int>> dp(n, vector<int>(m, -1));
return helper(s1, s2, 0, 0, dp);
}
};
Python Solution
class Solution:
def minimumDeleteSum(self, s1: str, s2: str) -> int:
n, m = len(s1), len(s2)
dp = [[-1] * m for _ in range(n)]
def helper(i, j):
# Base case: s1 exhausted
if i == n:
return sum(ord(c) for c in s2[j:])
# Base case: s2 exhausted
if j == m:
return sum(ord(c) for c in s1[i:])
if dp[i][j] != -1:
return dp[i][j]
if s1[i] == s2[j]:
dp[i][j] = helper(i + 1, j + 1)
else:
# Compare cost of deleting from s1 vs deleting from s2
delete_s1 = ord(s1[i]) + helper(i + 1, j)
delete_s2 = ord(s2[j]) + helper(i, j + 1)
dp[i][j] = min(delete_s1, delete_s2)
return dp[i][j]
return helper(0, 0)
JavaScript Solution
/**
* @param {string} s1
* @param {string} s2
* @return {number}
*/
var minimumDeleteSum = function(s1, s2) {
const n = s1.length;
const m = s2.length;
const dp = Array.from({ length: n }, () => Array(m).fill(-1));
function helper(i, j) {
// Base case: s1 exhausted
if (i === n) {
let sum = 0;
for (let k = j; k < m; k++) sum += s2.charCodeAt(k);
return sum;
}
// Base case: s2 exhausted
if (j === m) {
let sum = 0;
for (let k = i; k < n; k++) sum += s1.charCodeAt(k);
return sum;
}
if (dp[i][j] !== -1) return dp[i][j];
if (s1[i] === s2[j]) {
dp[i][j] = helper(i + 1, j + 1);
} else {
// Minimum of deleting s1[i] or s2[j]
const deleteS1 = s1.charCodeAt(i) + helper(i + 1, j);
const deleteS2 = s2.charCodeAt(j) + helper(i, j + 1);
dp[i][j] = Math.min(deleteS1, deleteS2);
}
return dp[i][j];
}
return helper(0, 0);
};
Key Takeaways
- Memoization: By caching results in a array, we reduce the time complexity from exponential to , where and are the lengths of the strings.
-
ASCII Mapping: Character values are not just labels. Using
ord()in Python orcharCodeAt()in JavaScript allows us to treat text as numerical data for optimization. -
Decision Trees: This problem teaches you how to model choices (delete
s1vs deletes2) as a tree of recursive calls.
Final Thoughts
This problem is a classic example of how minor tweaks to a standard algorithm can change its application. In real-world software engineering, this logic is used in Bioinformatics to align DNA sequences or in Version Control Systems (like Git) to calculate the "diff" between two file versions. Mastering these weighted string problems will make you much more effective at building search engines or comparison tools.
Top comments (0)