第九章字符串算法：KMP、Rabin-Karp 与 AC 自动机#

一句话理解：字符串匹配的本质是”利用已知信息跳过不可能的匹配位置”——KMP 用前缀函数跳过，Rabin-Karp 用哈希跳过，AC 自动机把两者结合扩展到多模式。

📋 前置知识：数据结构 Ch8（字典树/Trie）、Ch5（哈希表）

9.1 概念直觉 —— 字符串匹配问题的演进#

从暴力到智能 —— 一段 30 年的算法进化#

1
问题：在文本串 S 中找模式串 P 的所有出现位置。
2

3
暴力匹配：O(n*m)
4
  每次匹配失败后，S 的指针只前进 1 步 → 大量重复比较
5

6
KMP (1977)：O(n+m)
7
  预处理 P 的"前缀函数" → 匹配失败时，利用已知信息跳过
8

9
Rabin-Karp (1987)：O(n+m) 平均
10
  用哈希快速比较 → 只有哈希相同时才逐字符验证
11

12
AC 自动机 (1975)：O(n + total_matches)
13
  把多个模式串建成 Trie + fail 指针 → 一次扫描匹配所有模式
14

15
Manacher (1975)：O(n)
16
  专门解决最长回文子串的 O(n) 算法

graph LR BF["暴力匹配\nO(n*m)"] -->|"利用失败信息"| KMP["KMP\nO(n+m)\n单模式"] BF -->|"用哈希快速比较"| RK["Rabin-Karp\nO(n+m) 平均\n单模式"] KMP -->|"扩展到多模式"| AC["AC 自动机\nO(n+matches)\n多模式"] RK -->|"内容去重"| HASH["内容哈希\n文件指纹"] style BF fill:#555,stroke:#888,color:#ccc style KMP fill:#d00000,stroke:#e85d04,color:white style RK fill:#e85d04,stroke:#f48c06,color:white style AC fill:#2d6a4f,stroke:#40916c,color:white style HASH fill:#7b2cbf,stroke:#9d4edd,color:white

9.2 字符串基础#

字符串哈希 —— 多项式哈希#

把字符串看作一个 base 进制的数，取模后得到哈希值。支持 O(1) 求任意子串的哈希。

1
// 多项式哈希：hash(s) = (s[0]*base^{n-1} + s[1]*base^{n-2} + ... + s[n-1]) % MOD
2
class StringHash {
3
    using ULL = unsigned long long;
4
    static const ULL BASE = 131;     // 常用基数：131, 13331
5
    static const ULL MOD  = 1e9 + 7; // 或直接用 ULL 自然溢出
6

7
    std::vector<ULL> _hash;   // _hash[i] = s[0..i-1] 的哈希
8
    std::vector<ULL> _power;  // _power[i] = BASE^i
9

10
public:
11
    explicit StringHash(const std::string& s) {
12
        int n = s.size();
13
        _hash.resize(n + 1);
14
        _power.resize(n + 1);
15

16
        _hash[0] = 0;
17
        _power[0] = 1;
18
        for (int i = 0; i < n; ++i) {
19
            _hash[i + 1] = (_hash[i] * BASE + s[i]) % MOD;
20
            _power[i + 1] = (_power[i] * BASE) % MOD;
21
        }
22
    }
23

24
    // 子串 s[l..r] 的哈希值（0-indexed，闭区间）
25
    ULL getHash(int l, int r) const {
26
        return (_hash[r + 1] - _hash[l] * _power[r - l + 1] % MOD + MOD) % MOD;
27
    }
28
};
29

30
// 使用示例：
31
// StringHash sh("hello");
32
// sh.getHash(1, 3) → "ell" 的哈希

⚠️ 单哈希有碰撞风险。生产环境用双哈希（两个不同的 BASE 和 MOD），或直接用 unsigned long long 自然溢出（碰撞概率极低，一般够用）。

最长公共前缀 (LeetCode 14)#

1
// 纵向扫描：逐个字符比较
2
std::string longestCommonPrefix(std::vector<std::string>& strs) {
3
    if (strs.empty()) return "";
4

5
    for (int i = 0; i < strs[0].size(); ++i) {
6
        char c = strs[0][i];
7
        for (int j = 1; j < strs.size(); ++j) {
8
            if (i == strs[j].size() || strs[j][i] != c) {
9
                return strs[0].substr(0, i);
10
            }
11
        }
12
    }
13
    return strs[0];
14
}

9.3 KMP 算法 —— 永不回退的匹配#

核心思想#

暴力匹配的浪费：模式串”ABCABD”匹配文本”ABCABCABD”时——

1
S: A B C A B C A B D
2
P: A B C A B D
3
         ↑ 不匹配！
4

5
暴力做法：P 右移 1 位，从 P[0] 重新比较
6
KMP 做法：P 右移 3 位，从 P[2] 继续比较（利用了"AB"是 P 的前缀且也是已匹配后缀的事实）

KMP 的核心是 next 数组（前缀函数）：

1
next[i] = 模式串 P[0..i] 的"最长相等前后缀"的长度
2
        = 匹配失败时，可以跳过多少个字符
3

4
更正式的定义：
5
next[i] = max{ k | k < i+1 且 P[0..k-1] == P[i-k+1..i] }

next 数组的推导过程#

1
以 P = "ABABC" 为例：
2

3
i=0: "A"         → 无真前缀=真后缀 → next[0] = 0
4
i=1: "AB"        → 前缀[A] ≠ 后缀[B] → next[1] = 0
5
i=2: "ABA"       → 前缀[A] = 后缀[A] → next[2] = 1
6
i=3: "ABAB"      → 前缀[AB] = 后缀[AB] → next[3] = 2
7
i=4: "ABABC"     → 无相等前后缀 → next[4] = 0
8

9
next = [0, 0, 1, 2, 0]

graph LR subgraph "P = ABABC 的 next 数组含义" direction TB N0["next[0]=0: A (无)"] N1["next[1]=0: AB (无)"] N2["next[2]=1: ABA → 前缀A=后缀A"] N3["next[3]=2: ABAB → 前缀AB=后缀AB"] N4["next[4]=0: ABABC (无)"] end style N2 fill:#2d6a4f,stroke:#40916c,color:white style N3 fill:#2d6a4f,stroke:#40916c,color:white

完整实现#

1
#include <vector>
2
#include <string>
3

4
// 构建 next 数组（也称为 pi 数组 / 前缀函数）
5
std::vector<int> buildNext(const std::string& pattern) {
6
    int m = pattern.size();
7
    std::vector<int> next(m, 0);
8

9
    // j = 当前已匹配的前缀长度
10
    for (int i = 1; i < m; ++i) {
11
        int j = next[i - 1];  // 上一位置的前缀函数值
12

13
        // 回退：当前字符不匹配时，沿 next 链回退
14
        while (j > 0 && pattern[i] != pattern[j]) {
15
            j = next[j - 1];
16
        }
17

18
        if (pattern[i] == pattern[j]) {
19
            ++j;
20
        }
21
        next[i] = j;
22
    }
23
    return next;
24
}
25

26
// KMP 字符串匹配
27
std::vector<int> kmpSearch(const std::string& text, const std::string& pattern) {
28
    int n = text.size(), m = pattern.size();
29
    if (m == 0) return {};
30

31
    auto next = buildNext(pattern);
32
    std::vector<int> matches;
33

34
    int j = 0;  // 当前已匹配的模式串长度
35
    for (int i = 0; i < n; ++i) {
36
        // 不匹配时，沿 next 数组回退（j 可能回退多次）
37
        while (j > 0 && text[i] != pattern[j]) {
38
            j = next[j - 1];
39
        }
40

41
        if (text[i] == pattern[j]) {
42
            ++j;
43
        }
44

45
        if (j == m) {  // 完全匹配
46
            matches.push_back(i - m + 1);
47
            j = next[j - 1];  // 继续找下一个匹配
48
        }
49
    }
50
    return matches;
51
}

1
KMP 匹配过程示例：S="ABCABCABD", P="ABCABD"
2

3
next=[0,0,0,1,2,0]
4

5
i=0..4: 匹配 "ABCAB" (j=5)
6
i=5: S[5]='C' ≠ P[5]='D' → j = next[4] = 2
7
     → 跳过 5-2=3 个字符，从 j=2 继续！
8
     → 相当于 P 右移 3 位，P 的指针在 2 ("C")
9

10
i=5: S[5]='C' = P[2]='C' → j=3
11
i=6: S[6]='A' = P[3]='A' → j=4
12
i=7: S[7]='B' = P[4]='B' → j=5
13
i=8: S[8]='D' = P[5]='D' → j=6==m → 匹配成功！

💡 面试中的表述：「KMP 的时间复杂度是 O(n+m)，因为 j 最多增加 n 次，每次回退至少减 1，总回退次数不超过总增加次数。空间 O(m)。关键在于 next 数组——它存储的是模式串每个前缀的”最长相等前后缀长度”，匹配失败时告诉我们可以跳过多少字符。」

KMP 的手写记忆版#

1
// 面试最简版本：把 next 构建和匹配合并为同一个模板
2
// next[i] = 以 i 结尾的最长相等前后缀长度
3
int strStr(std::string haystack, std::string needle) {
4
    if (needle.empty()) return 0;
5
    int n = haystack.size(), m = needle.size();
6

7
    // 1. 构建 next
8
    std::vector<int> next(m, 0);
9
    for (int i = 1, j = 0; i < m; ++i) {
10
        while (j > 0 && needle[i] != needle[j]) j = next[j - 1];
11
        if (needle[i] == needle[j]) ++j;
12
        next[i] = j;
13
    }
14

15
    // 2. 匹配
16
    for (int i = 0, j = 0; i < n; ++i) {
17
        while (j > 0 && haystack[i] != needle[j]) j = next[j - 1];
18
        if (haystack[i] == needle[j]) ++j;
19
        if (j == m) return i - m + 1;
20
    }
21
    return -1;
22
}

9.4 Rabin-Karp 算法 —— 用哈希快速比较#

Rabin-Karp 的核心思想：字符串比较 O(m) → 哈希值比较 O(1)。

1
传统：每次匹配需要 O(m) 逐字符比较
2
RK：  先比较 O(1) 哈希 → 只有哈希相同时才逐字符验证
3

4
滚动哈希：已知 hash(s[l..r])，O(1) 计算 hash(s[l+1..r+1])
5
  hash[l+1..r+1] = (hash[l..r] - s[l]*BASE^{len-1}) * BASE + s[r+1]

1
std::vector<int> rabinKarp(const std::string& text, const std::string& pattern) {
2
    int n = text.size(), m = pattern.size();
3
    if (m == 0 || m > n) return {};
4

5
    using ULL = unsigned long long;
6
    const ULL BASE = 131;
7

8
    // 预处理 BASE^{m-1}
9
    ULL power = 1;
10
    for (int i = 0; i < m - 1; ++i) power *= BASE;
11

12
    // 计算模式串的哈希
13
    ULL patternHash = 0;
14
    for (char c : pattern) patternHash = patternHash * BASE + c;
15

16
    // 计算文本串第一个窗口的哈希
17
    ULL windowHash = 0;
18
    for (int i = 0; i < m; ++i) windowHash = windowHash * BASE + text[i];
19

20
    std::vector<int> matches;
21

22
    // 滑动窗口
23
    for (int i = 0; i <= n - m; ++i) {
24
        if (windowHash == patternHash) {
25
            // 哈希相同 → 逐字符验证（防碰撞）
26
            bool match = true;
27
            for (int j = 0; j < m; ++j) {
28
                if (text[i + j] != pattern[j]) {
29
                    match = false;
30
                    break;
31
                }
32
            }
33
            if (match) matches.push_back(i);
34
        }
35

36
        // 滚动哈希：移出 text[i]，移入 text[i+m]
37
        if (i < n - m) {
38
            windowHash = (windowHash - text[i] * power) * BASE + text[i + m];
39
        }
40
    }
41
    return matches;
42
}

⚠️ Rabin-Karp 的哈希碰撞在最坏情况下可能导致 O(n*m)——但实际应用中概率极低。用双哈希或自然溢出（ULL）可以进一步降低风险。

维度	KMP	Rabin-Karp
核心思想	利用失败信息跳过	利用哈希快速比较
最坏时间	O(n+m) 稳定	O(n*m) 哈希碰撞
实际速度	快，稳定	常数小，通常更快
额外空间	O(m) next 数组	O(1) 只存哈希值
扩展性	扩展到 AC 自动机	扩展到多模式哈希
面试手写	常见（考思想）	较少（太简单）

9.5 AC 自动机 —— 多模式匹配#

AC 自动机 = Trie + KMP 的 fail 指针。一次扫描文本，同时匹配所有模式串。

1
Trie：      共享前缀，压缩多模式串的存储
2
KMP：       利用 next 数组在失配时跳转
3
AC 自动机：  在 Trie 上为每个节点构建 fail 指针
4
           fail[node] = node 的"最长可匹配后缀"对应的 Trie 节点

graph TD subgraph "模式串: he, she, his, hers 的 AC 自动机" R["root"] -->|h| H["h"] R -->|s| S["s"] H -->|e| HE["he*"] H -->|i| HI["hi"] HI -->|s| HIS["his*"] S -->|h| SH["sh"] SH -->|e| SHE["she*"] SHE -->|r| SHER["her"] SHER -->|s| SHERS["hers*"] end subgraph "fail 指针（虚线）" SH -.->|fail| H SHE -.->|fail| HE SHER -.->|fail| HER["(不存在)"] HI -.->|fail| I["(root的子节点)"] end style HE fill:#2d6a4f,stroke:#40916c,color:white style HIS fill:#2d6a4f,stroke:#40916c,color:white style SHE fill:#2d6a4f,stroke:#40916c,color:white style SHERS fill:#2d6a4f,stroke:#40916c,color:white

* 标记为模式串的结束节点。fail 指针的跳转逻辑：匹配失败时，沿 fail 跳到”最长可匹配后缀”继续尝试。

AC 自动机实现#

1
#include <vector>
2
#include <string>
3
#include <queue>
4

5
struct AhoCorasick {
6
    static constexpr int ALPHABET = 26;
7

8
    struct Node {
9
        int next[ALPHABET] = {};  // 子节点
10
        int fail = 0;              // fail 指针
11
        int output = 0;            // 以此节点结尾的模式串数量（或模式串ID列表）
12
    };
13

14
    std::vector<Node> trie;
15

16
    AhoCorasick() : trie(1) {}  // root 在索引 0
17

18
    // 插入一个模式串
19
    void insert(const std::string& pattern) {
20
        int node = 0;
21
        for (char c : pattern) {
22
            int idx = c - 'a';
23
            if (trie[node].next[idx] == 0) {
24
                trie[node].next[idx] = trie.size();
25
                trie.emplace_back();
26
            }
27
            node = trie[node].next[idx];
28
        }
29
        ++trie[node].output;  // 标记为模式串结尾
30
    }
31

32
    // 构建 fail 指针（BFS）
33
    void build() {
34
        std::queue<int> q;
35

36
        // 第一层：root 的直接子节点的 fail 指向 root
37
        for (int c = 0; c < ALPHABET; ++c) {
38
            int child = trie[0].next[c];
39
            if (child) {
40
                trie[child].fail = 0;
41
                q.push(child);
42
            }
43
        }
44

45
        while (!q.empty()) {
46
            int node = q.front(); q.pop();
47

48
            for (int c = 0; c < ALPHABET; ++c) {
49
                int child = trie[node].next[c];
50
                if (!child) continue;
51

52
                // 找 child 的 fail：
53
                // 沿父节点的 fail 链找第一个也有子节点 c 的节点
54
                int f = trie[node].fail;
55
                while (f != 0 && trie[f].next[c] == 0) {
56
                    f = trie[f].fail;
57
                }
58
                trie[child].fail = trie[f].next[c];
59

60
                // 合并 output：child 也匹配了其 fail 路径上所有的模式串
61
                trie[child].output += trie[trie[child].fail].output;
62

63
                q.push(child);
64
            }
65
        }
66
    }
67

68
    // 在文本中搜索所有模式串
69
    int search(const std::string& text) {
70
        int node = 0;
71
        int totalMatches = 0;
72

73
        for (char ch : text) {
74
            int c = ch - 'a';
75

76
            // 失配时沿 fail 链跳转
77
            while (node != 0 && trie[node].next[c] == 0) {
78
                node = trie[node].fail;
79
            }
80

81
            node = trie[node].next[c];
82
            totalMatches += trie[node].output;
83
        }
84
        return totalMatches;
85
    }
86
};

💡 面试中的表述：「AC 自动机 = Trie + KMP 的 fail 指针。构建 fail 用 BFS：对于每个节点，其子节点的 fail 指向”父节点的 fail 链上第一个有相同子节点的节点”。搜索时沿 fail 跳转，一次扫描即可匹配所有模式串。时间复杂度 O(n + total_matches)。」

AC 自动机 vs 逐个 KMP#

1
场景：1000 个敏感词，每条聊天消息都要检查
2

3
逐个 KMP：每条消息跑 1000 次匹配 → O(1000 * message_len)
4
AC 自动机：一次扫描 → O(message_len + 匹配数)
5
          快 1000 倍

9.6 Manacher 算法 —— 最长回文子串 O(n)#

Manacher 的核心是利用已知回文的对称性，避免重复计算。

1
// Manacher 算法：O(n) 找最长回文子串
2
// 预处理：在字符间插入 '#'，统一处理奇偶回文
3
// "aba"  → "#a#b#a#"
4
// "aa"   → "#a#a#"
5
std::string longestPalindromeManacher(const std::string& s) {
6
    if (s.empty()) return "";
7

8
    // 预处理
9
    std::string t = "#";
10
    for (char c : s) {
11
        t += c;
12
        t += '#';
13
    }
14

15
    int n = t.size();
16
    std::vector<int> p(n, 0);  // p[i] = 以 i 为中心的回文半径
17
    int center = 0, right = 0;  // 当前最右回文的中心和右边界
18
    int maxLen = 0, maxCenter = 0;
19

20
    for (int i = 0; i < n; ++i) {
21
        // 核心优化：利用对称性初始化 p[i]
22
        int mirror = 2 * center - i;
23
        if (i < right) {
24
            p[i] = std::min(right - i, p[mirror]);
25
        }
26

27
        // 中心扩展
28
        while (i - p[i] - 1 >= 0 && i + p[i] + 1 < n &&
29
               t[i - p[i] - 1] == t[i + p[i] + 1]) {
30
            ++p[i];
31
        }
32

33
        // 更新最右回文边界
34
        if (i + p[i] > right) {
35
            center = i;
36
            right = i + p[i];
37
        }
38

39
        if (p[i] > maxLen) {
40
            maxLen = p[i];
41
            maxCenter = i;
42
        }
43
    }
44

45
    // 从 t 中还原原始回文串
46
    int start = (maxCenter - maxLen) / 2;
47
    return s.substr(start, maxLen);
48
}

1
Manacher 的核心洞察：
2
当 i 在当前最右回文的覆盖范围内时（i < right），
3
以 i 为中心的回文和以 mirror 为中心的回文关于 center 对称。
4

5
p[mirror] 已计算好 → 直接复用（但不超过 right-i 的范围）
6
这避免了大量不必要的中心扩展。

9.7 高频面试题精讲#

题目 1：重复的 DNA 序列 (LeetCode 187)#

1
// 找所有出现超过一次的 10 字符子串
2
// Rabin-Karp 的典型应用：滚动哈希 + 频次统计
3
std::vector<std::string> findRepeatedDnaSequences(std::string s) {
4
    if (s.size() < 10) return {};
5

6
    std::unordered_map<int, int> count;  // hash → 出现次数
7
    std::vector<std::string> result;
8

9
    // 用 2-bit 编码 A/C/G/T → 20 bit 就可以存一个 10-gram
10
    const int L = 10;
11
    std::unordered_map<char, int> code = {{'A', 0}, {'C', 1}, {'G', 2}, {'T', 3}};
12
    const int mask = (1 << (2 * L)) - 1;  // 低 20 位为 1
13

14
    int hash = 0;
15
    for (int i = 0; i < s.size(); ++i) {
16
        hash = ((hash << 2) | code[s[i]]) & mask;  // 滚动哈希
17
        if (i >= L - 1) {
18
            if (++count[hash] == 2) {
19
                result.push_back(s.substr(i - L + 1, L));
20
            }
21
        }
22
    }
23
    return result;
24
}

题目 2：最短回文串 (LeetCode 214)#

1
// 在字符串前面添加字符使其成为回文串，求最短
2
// KMP 思路：找 s 的"最长回文前缀" = s 和 reverse(s) 的最长公共前缀
3
std::string shortestPalindrome(std::string s) {
4
    if (s.empty()) return "";
5

6
    std::string rev(s.rbegin(), s.rend());
7
    std::string combined = s + "#" + rev;  // '#' 分隔防止越界匹配
8

9
    // 对 combined 构建 next 数组
10
    int m = combined.size();
11
    std::vector<int> next(m, 0);
12
    for (int i = 1; i < m; ++i) {
13
        int j = next[i - 1];
14
        while (j > 0 && combined[i] != combined[j]) {
15
            j = next[j - 1];
16
        }
17
        if (combined[i] == combined[j]) ++j;
18
        next[i] = j;
19
    }
20

21
    // next.back() = s 的最长回文前缀长度
22
    int prefixLen = next.back();
23
    return rev.substr(0, s.size() - prefixLen) + s;
24
}

题目 3：字符串相乘 (LeetCode 43)#

1
// 大数乘法：模拟竖式乘法
2
std::string multiply(std::string num1, std::string num2) {
3
    if (num1 == "0" || num2 == "0") return "0";
4

5
    int m = num1.size(), n = num2.size();
6
    std::vector<int> result(m + n, 0);
7

8
    // 从低位到高位
9
    for (int i = m - 1; i >= 0; --i) {
10
        for (int j = n - 1; j >= 0; --j) {
11
            int mul = (num1[i] - '0') * (num2[j] - '0');
12
            int sum = mul + result[i + j + 1];  // 加上之前的进位
13

14
            result[i + j + 1] = sum % 10;        // 当前位
15
            result[i + j]     += sum / 10;       // 进位
16
        }
17
    }
18

19
    // 转为字符串
20
    std::string ans;
21
    for (int digit : result) {
22
        if (!(ans.empty() && digit == 0)) {  // 跳过前导零
23
            ans += ('0' + digit);
24
        }
25
    }
26
    return ans;
27
}

面试题速查清单#

#	题目	LeetCode	难度	核心技巧
1	实现 strStr()	28	Easy	KMP / 暴力均可
2	最长公共前缀	14	Easy	纵向扫描
3	重复的 DNA 序列	187	Medium	滚动哈希 + 2-bit 编码
4	最长回文子串	5	Medium	Manacher / DP / 中心扩展
5	字符串相乘	43	Medium	竖式乘法
6	最短回文串	214	Hard	KMP next 数组妙用
7	实现 Trie	208	Medium	数据结构 Ch8 回顾

9.8 🎮 游戏实战#

9.8.1 聊天敏感词过滤 —— KMP + AC 自动机#

1
// 单敏感词检查：KMP
2
bool containsBadWord_KMP(const std::string& message,
3
                          const std::string& badWord) {
4
    return !kmpSearch(message, badWord).empty();
5
}
6

7
// 多敏感词检查：AC 自动机（一次扫描匹配所有敏感词）
8
class SensitiveWordFilter {
9
    AhoCorasick _ac;
10

11
public:
12
    void loadWords(const std::vector<std::string>& badWords) {
13
        for (const auto& word : badWords) {
14
            _ac.insert(word);
15
        }
16
        _ac.build();
17
    }
18

19
    int countMatches(const std::string& message) {
20
        return _ac.search(message);
21
    }
22

23
    bool isClean(const std::string& message) {
24
        return countMatches(message) == 0;
25
    }
26
};

9.8.2 资源文件去重 —— Rabin-Karp 滚动哈希#

1
// 对大文件做内容去重：用滚动哈希快速计算文件指纹
2
// 不需要把整个文件读入内存
3
#include <fstream>
4

5
uint64_t fileFingerprint(const std::string& filePath, int sampleSize = 4096) {
6
    std::ifstream file(filePath, std::ios::binary);
7
    if (!file) return 0;
8

9
    const uint64_t BASE = 131;
10
    uint64_t hash = 0;
11
    char buffer[4096];
12

13
    // 读文件头部
14
    file.read(buffer, sampleSize);
15
    for (int i = 0; i < file.gcount(); ++i) {
16
        hash = hash * BASE + (unsigned char)buffer[i];
17
    }
18

19
    // 跳到文件尾部
20
    file.seekg(-std::min(sampleSize, (int)file.tellg()), std::ios::end);
21

22
    // 读文件尾部并混合
23
    file.read(buffer, sampleSize);
24
    for (int i = 0; i < file.gcount(); ++i) {
25
        hash = hash * BASE + (unsigned char)buffer[i];
26
    }
27

28
    // 混合文件大小
29
    file.seekg(0, std::ios::end);
30
    hash = hash * BASE + file.tellg();
31

32
    return hash;
33
}

9.8.3 资源路径前缀索引 —— 字符串哈希#

1
// 文件系统路径的前缀匹配："/assets/textures/ui/" 下所有资源
2
// 用前缀哈希 + Trie 实现快速查找
3
class ResourcePathIndex {
4
    struct PathNode {
5
        std::unordered_map<std::string, int> children;  // 目录名 → 子节点
6
        std::vector<int> assets;                        // 此路径下的资源 ID
7
    };
8

9
    std::vector<PathNode> _nodes;
10

11
public:
12
    ResourcePathIndex() : _nodes(1) {}
13

14
    void insert(const std::string& path, int assetId) {
15
        int node = 0;
16
        size_t start = 0;
17

18
        while (start < path.size()) {
19
            size_t slash = path.find('/', start);
20
            std::string dir = path.substr(start, slash - start);
21
            start = slash + 1;
22

23
            if (_nodes[node].children.count(dir) == 0) {
24
                _nodes[node].children[dir] = _nodes.size();
25
                _nodes.emplace_back();
26
            }
27
            node = _nodes[node].children[dir];
28
        }
29
        _nodes[node].assets.push_back(assetId);
30
    }
31

32
    // 查询某目录下的所有资源
33
    std::vector<int> query(const std::string& directory) {
34
        int node = 0;
35
        size_t start = 0;
36

37
        while (start < directory.size()) {
38
            size_t slash = directory.find('/', start);
39
            std::string dir = directory.substr(start, slash - start);
40
            start = slash + 1;
41

42
            if (_nodes[node].children.count(dir) == 0) {
43
                return {};  // 路径不存在
44
            }
45
            node = _nodes[node].children[dir];
46
        }
47
        return _nodes[node].assets;
48
    }
49
};

9.8.4 回文检测 —— Manacher 在聊天中的应用#

1
// 检测玩家是否用回文模式刷屏
2
// 如 "abccbaabccbaabccba" → 重复的回文结构
3
struct PalindromeInfo {
4
    int center;
5
    int radius;
6
    std::string palindrome;
7
};
8

9
PalindromeInfo longestPalindrome(const std::string& text) {
10
    // 使用 Manacher 算法（见 9.6 节）
11
    // 返回最长回文及其位置
12
    // 如果回文长度 > 阈值，标记为潜在刷屏
13
    auto result = longestPalindromeManacher(text);
14
    return {0, 0, result};
15
}

9.9 30 秒速答#

Q：KMP 算法的核心思想？

KMP 利用”已匹配部分的相等前后缀”来避免文本指针回退。next[i] 存储模式串前缀 P[0..i] 的最长相等前后缀长度。匹配失败时，文本指针不动，模式串指针跳到 next[j-1]，相当于右移了 j - next[j-1] 位。构建 next 是 O(m)，匹配是 O(n)，总体 O(n+m)。

Q：Rabin-Karp 的滚动哈希如何实现？

维护一个长度为 m 的窗口哈希。窗口右移时：newHash = (oldHash - s[left]*BASE^{m-1}) * BASE + s[right]。用自然溢出（unsigned long long）或双哈希避免碰撞。哈希相同后才逐字符验证。

Q：AC 自动机和 KMP 的关系？

AC 自动机 = Trie + KMP fail 指针。KMP 的 next 数组在一条链上跳转，AC 自动机把 fail 指针扩展到 Trie 上——每个节点的 fail 指向”最长的可匹配后缀”对应的节点。构建 fail 用 BFS，搜索时沿 fail 跳转，一次扫描同时匹配所有模式串。

Q：Manacher 为什么是 O(n)？

利用回文的对称性：当 i 在当前最右回文范围内时，p[i] 可以直接复用对称位置 mirror 的 p[mirror] 值（不超过 right-i 的范围）。每个位置的中心扩展都是”从复用值开始”，总体扩展次数受限于 right 的增长——right 最多从 0 增长到 n，所以总扩展次数是 O(n)。

Q：KMP 和 Rabin-Karp 各适合什么场景？

KMP 保证 O(n+m) 最坏情况，不依赖哈希，适合模式串较短、对最坏情况有要求的场景。Rabin-Karp 实现更简单，常数更小，平均更快，且能自然扩展到 2D 模式匹配和文件指纹。面试中 KMP 更常考（考思想），工程中两者都可以用 std::search 替代。

9.10 本章习题清单#

1
字符串基础：
2
  □ 28.  实现 strStr() —— 手写 KMP
3
  □ 14.  最长公共前缀 —— 纵向扫描
4

5
KMP：
6
  □ 459. 重复的子字符串 —— KMP next 数组妙用
7
  □ 214. 最短回文串 —— KMP + 反转
8

9
Rabin-Karp / 哈希：
10
  □ 187. 重复的 DNA 序列 —— 滚动哈希
11
  □ 1044. 最长重复子串 —— 二分 + 哈希
12

13
回文：
14
  □ 5.   最长回文子串 —— Manacher / DP / 中心扩展
15
  □ 647. 回文子串 —— Manacher / 中心扩展
16

17
大数：
18
  □ 43.  字符串相乘 —— 竖式乘法
19
  □ 415. 字符串相加 —— 大数加法
20

21
回顾 Trie：
22
  □ 208. 实现 Trie —— 数据结构 Ch8 回顾

📖 全系列终。回顾九章：第一章排序 → 第二章二分 → 第三章 DP基础 → 第四章 DP进阶 → 第五章贪心 → 第六章回溯与A* → 第七章数学 → 第八章位运算 → 第九章字符串。
总计约 110 道面试题，覆盖排序、二分、DP、贪心、回溯、搜索、数学、位运算、字符串九大领域。配合数据结构系列食用效果最佳。

音乐

音乐

第九章字符串算法：KMP、Rabin-Karp 与 AC 自动机#

9.1 概念直觉 —— 字符串匹配问题的演进#

从暴力到智能 —— 一段 30 年的算法进化#

9.2 字符串基础#

字符串哈希 —— 多项式哈希#

最长公共前缀 (LeetCode 14)#

9.3 KMP 算法 —— 永不回退的匹配#

核心思想#

next 数组的推导过程#

完整实现#

KMP 的手写记忆版#

9.4 Rabin-Karp 算法 —— 用哈希快速比较#

9.5 AC 自动机 —— 多模式匹配#

AC 自动机实现#

AC 自动机 vs 逐个 KMP#

9.6 Manacher 算法 —— 最长回文子串 O(n)#

9.7 高频面试题精讲#

题目 1：重复的 DNA 序列 (LeetCode 187)#

题目 2：最短回文串 (LeetCode 214)#

题目 3：字符串相乘 (LeetCode 43)#

面试题速查清单#

9.8 🎮 游戏实战#

9.8.1 聊天敏感词过滤 —— KMP + AC 自动机#

9.8.2 资源文件去重 —— Rabin-Karp 滚动哈希#

9.8.3 资源路径前缀索引 —— 字符串哈希#

9.8.4 回文检测 —— Manacher 在聊天中的应用#

9.9 30 秒速答#

9.10 本章习题清单#

文章分享

评论区

音乐

目录

音乐

音乐

第九章 字符串算法：KMP、Rabin-Karp 与 AC 自动机

第九章 字符串算法：KMP、Rabin-Karp 与 AC 自动机#

9.1 概念直觉 —— 字符串匹配问题的演进#

从暴力到智能 —— 一段 30 年的算法进化#

9.2 字符串基础#

字符串哈希 —— 多项式哈希#

最长公共前缀 (LeetCode 14)#

9.3 KMP 算法 —— 永不回退的匹配#

核心思想#

next 数组的推导过程#

完整实现#

KMP 的手写记忆版#

9.4 Rabin-Karp 算法 —— 用哈希快速比较#

9.5 AC 自动机 —— 多模式匹配#

AC 自动机实现#

AC 自动机 vs 逐个 KMP#

9.6 Manacher 算法 —— 最长回文子串 O(n)#

9.7 高频面试题精讲#

题目 1：重复的 DNA 序列 (LeetCode 187)#

题目 2：最短回文串 (LeetCode 214)#

题目 3：字符串相乘 (LeetCode 43)#

面试题速查清单#

9.8 🎮 游戏实战#

9.8.1 聊天敏感词过滤 —— KMP + AC 自动机#

9.8.2 资源文件去重 —— Rabin-Karp 滚动哈希#

9.8.3 资源路径前缀索引 —— 字符串哈希#

9.8.4 回文检测 —— Manacher 在聊天中的应用#

9.9 30 秒速答#

9.10 本章习题清单#

文章分享

评论区

音乐

目录

第九章字符串算法：KMP、Rabin-Karp 与 AC 自动机

第九章字符串算法：KMP、Rabin-Karp 与 AC 自动机#