LeetCode 字符串处理详解：常见技巧与实战应用

VocabVictor2024-09-022024-09-04

引言

字符串处理是编程中最常见的任务之一，也是 LeetCode 中的重要题型。掌握字符串处理的各种技巧不仅能帮助我们更好地解决 LeetCode 问题，还能在实际编程中提高效率。本文将深入探讨字符串处理的基本概念、常见技巧以及在 LeetCode 上的实际应用。

1. 字符串基础

在开始之前，我们先回顾一下字符串的基本概念：

字符串是由字符组成的有限序列
在大多数编程语言中，字符串是不可变的
字符串的索引通常从 0 开始
字符串的长度可以用 len() 函数（在 Python 中）或 .length 属性（在 Java 中）获取

2. 常见字符串处理技巧

2.1 遍历字符串

#include <iostream>
#include <string>

int main() {
    std::string s = "Hello";
    for (char c : s) {
        std::cout << c << std::endl;
    }
    return 0;
}

2.2 字符串切片

#include <iostream>
#include <string>

int main() {
    std::string s = "Hello, World!";
    std::cout << s.substr(7, 5) << std::endl;  // 输出 "World"
    return 0;
}

2.3 字符串拼接

#include <iostream>
#include <string>

int main() {
    std::string s1 = "Hello";
    std::string s2 = "World";
    std::string s3 = s1 + " " + s2;
    std::cout << s3 << std::endl;  // 输出 "Hello World"
    return 0;
}

2.4 字符串反转

#include <iostream>
#include <string>
#include <algorithm>

int main() {
    std::string s = "Hello";
    std::reverse(s.begin(), s.end());
    std::cout << s << std::endl;  // 输出 "olleH"
    return 0;
}

2.5 字符串分割与连接

#include <iostream>
#include <string>
#include <vector>
#include <sstream>
#include <iterator>

int main() {
    std::string s = "a,b,c,d";
    std::vector<std::string> parts;
    std::stringstream ss(s);
    std::string item;

    while (std::getline(ss, item, ',')) {
        parts.push_back(item);
    }

    for (const auto& part : parts) {
        std::cout << part << " ";  // 输出 ['a', 'b', 'c', 'd']
    }
    std::cout << std::endl;

    std::string new_s;
    for (size_t i = 0; i < parts.size(); ++i) {
        new_s += parts[i];
        if (i != parts.size() - 1) {
            new_s += "-";
        }
    }

    std::cout << new_s << std::endl;  // 输出 "a-b-c-d"
    return 0;
}

3. LeetCode 实战应用

3.1 LeetCode 3. 最长的无重复字符子串

这是一个经典的滑动窗口问题，我们需要找出最长的无重复字符子串。

#include <unordered_map>
#include <string>
#include <algorithm>
class Solution {
    public:
    int lengthOfLongestSubstring(const std::string& s) {
        std::unordered_map<char, int> char_index;
        int max_length = 0, start = 0;
        for (int i = 0; i < s.length(); ++i) {
            char char = s[i];
            if (
                char_index.find(char) != char_index.end() 
                && start <= char_index[char]
            ) {
                start = char_index[char] + 1;
            } else {
                max_length = std::max(max_length, i - start + 1);
            }
            char_index[char] = i;
        }
        return max_length;
    }
};

3.2 LeetCode 5. 最长的回文子串

这是一个经典的字符串问题，我们需要找出最长的回文子串。

#include <string>
#include <algorithm>
class Solution {
    public:
    std::string longestPalindrome(const std::string& s) {
        if (s.empty()) {
            return "";
        }
        int start = 0, max_len = 1;
        auto expand_around_center = [&](int left, int right) {
            while (left >= 0 && right < s.length() && s[left] == s[right]) {
                left--;
                right++;
            }
            return right - left - 1;
        };
        for (int i = 0; i < s.length(); ++i) {
            int len1 = expand_around_center(i, i);
            int len2 = expand_around_center(i, i + 1);
            int length = std::max(len1, len2);
            if (length > max_len) {
                max_len = length;
                start = i - (length - 1) / 2;
            }
        }
        return s.substr(start, max_len);
    }
};

3.3 LeetCode 20. 有效的括号

这是一个使用栈来处理字符串的问题。

#include <stack>
#include <unordered_map>
#include <string>
class Solution {
    public:
    bool isValid(const std::string& s) {
        std::stack<char> stack;
        std::unordered_map<char, char> mapping = {
            {')', '('}, {'}', '{'}, {']', '['}
        };
        for (char char : s) {
            if (mapping.find(char) != mapping.end()) {
                if (stack.empty() || stack.top() != mapping[char]) {
                    return false;
                }
                stack.pop();
            } else {
                stack.push(char);
            }
        }
        return stack.empty();
    }
};

3.4 LeetCode 49. 字母异位词分组

这个问题展示了如何使用排序和哈希表来处理字符串。

#include <vector>
#include <string>
#include <unordered_map>
#include <algorithm>

class Solution {
public:
    std::vector<std::vector<std::string>> groupAnagrams(std::vector<std::string>& strs) {
        std::unordered_map<std::string, std::vector<std::string>> anagrams;
        
        for (const std::string& s : strs) {
            std::string sorted_s = s;
            std::sort(sorted_s.begin(), sorted_s.end());
            anagrams[sorted_s].push_back(s);
        }
        
        std::vector<std::vector<std::string>> result;
        for (const auto& pair : anagrams) {
            result.push_back(pair.second);
        }
        
        return result;
    }
};

4. 高级字符串处理技巧

4.1 KMP 算法

KMP（Knuth-Morris-Pratt）算法是一种高效的字符串匹配算法，特别适用于在一个长文本中查找一个模式串。

#include <vector>
#include <string>
#include <algorithm>

class Solution {
public:
    std::vector<int> computeLPSArray(const std::string& pattern) {
        int len = pattern.length();
        std::vector<int> lps(len, 0);
        int i = 1, len = 0;
        while (i < len) {
            if (pattern[i] == pattern[len]) {
                lps[i++] = ++len;
            } else {
                if (len != 0) {
                    len = lps[len - 1];
                } else {
                    lps[i++] = 0;
                }
            }
        }
        return lps;
    }

    int strStr(const std::string& haystack, const std::string& needle) {
        if (needle.empty()) {
            return 0;
        }
        std::vector<int> lps = computeLPSArray(needle);
        int i = 0, j = 0;
        while (i < haystack.length()) {
            if (haystack[i] == needle[j]) {
                i++;
                j++;
            }

            if (j == needle.length()) {
                return i - j;
            } else if (i < haystack.length() && haystack[i] != needle[j]) {
                if (j != 0) {
                    j = lps[j - 1];
                } else {
                    i++;
                }
            }
        }
        return -1;
    }
};

4.2 编辑距离

编辑距离（Edit Distance），又称Levenshtein距离，是指两个字符串之间，由一个转成另一个所需的最少编辑操作次数。许可的编辑操作包括将一个字符替换成另一个字符，插入一个字符，删除一个字符。

class Solution {
    public:
    int minDistance(const std::string& word1, const std::string& word2) {
        int m = word1.length();
        int n = word2.length();
        std::vector<std::vector<int>> dp(m + 1, std::vector<int>(n + 1, 0));

        for (int i = 0; i <= m; ++i) {
            dp[i][0] = i;
        }

        for (int j = 0; j <= n; ++j) {
            dp[0][j] = j;
        }

        for (int i = 1; i <= m; ++i) {
            for (int j = 1; j <= n; ++j) {
                if (word1[i - 1] == word2[j - 1]) {
                    dp[i][j] = dp[i - 1][j - 1];    
                } else {
                    dp[i][j] = std::min({dp[i - 1][j], dp[i][j - 1], dp[i - 1][j - 1]}) + 1;
                }
            }
        }

        return dp[m][n];
    }
};

4.3 Rabin-Karp 算法

Rabin-Karp 算法是另一种字符串匹配算法，它使用哈希函数来简化比较过程。

def rabin_karp(text, pattern):
    d = 256  # 字符集大小
    q = 101  # 一个大素数
    n = len(text)
    m = len(pattern)
    h = pow(d, m - 1) % q
    p = t = 0

    for i in range(m):
        p = (d * p + ord(pattern[i])) % q
        t = (d * t + ord(text[i])) % q

    for i in range(n - m + 1):
        if p == t:
            if text[i:i+m] == pattern:
                print(f"Pattern found at index {i}")

        if i < n - m:
            t = (d * (t - ord(text[i]) * h) + ord(text[i + m])) % q
            if t < 0:
                t += q

# 使用示例
text = "GEEKS FOR GEEKS"
pattern = "GEEK"
rabin_karp(text, pattern)

5. 字符串处理的常见问题类型

子串问题：如最长回文子串、无重复字符的最长子串
匹配问题：如通配符匹配、正则表达式匹配
编辑距离问题：如最小编辑距离、一次编辑
回文问题：如验证回文串、回文对
字符统计问题：如字符频率统计、异位词分组
字符串压缩与解压：如字符串压缩、解码字符串
字符串转换问题：如整数转罗马数字、驼峰命名转换

6. 字符串处理的优化技巧

使用合适的数据结构：如哈希表来存储字符频率
预处理：如构建前缀数组或后缀数组
滑动窗口：对于子串问题特别有效
双指针技巧：如快慢指针处理回文问题
位操作：对于只包含小写字母的字符串，可以使用位掩码
字符串哈希：快速比较子串是否相等
使用内置函数：如 split(), join(), replace() 等

7. 字符串处理中的常见陷阱

忽略边界条件：如空字符串、只有一个字符的字符串
字符编码问题：需要注意 ASCII 和 Unicode 的区别
大小写敏感性：有时需要先转换为小写或大写再处理
空白字符处理：注意处理空格、制表符、换行符等
字符串不可变性：在某些语言中，字符串是不可变的，需要创建新的字符串对象
整数溢出：在进行字符串和整数转换时要注意
时间复杂度：某些字符串操作可能导致意外的高时间复杂度

结语

字符串处理是一个广泛而深入的主题，涉及多种算法和技巧。通过本文的学习，我们深入了解了字符串处理的基本概念、LeetCode 实战应用以及各种高级技巧。在实际应用中，我们需要根据具体问题选择合适的算法和数据结构。记住，高效的字符串处理不仅需要了解各种技巧，还需要对问题有深入的理解和分析。通过不断练习和总结，我们可以在算法解题中越来越熟练地运用字符串处理技巧，解决各种复杂问题。同时，这些技能在实际的软件开发中也会非常有用，能够帮助我们编写更高效、更可靠的代码。