第九章并查集 (Union-Find)#

一句话理解：并查集是专门处理”动态连通性”的数据结构——它能在近 O(1) 的时间内完成两个操作：判断两个元素是否在同一集合（Find），以及合并两个集合（Union）。

9.1 概念直觉 —— What & Why#

”找老大”类比#

1
想象一个黑帮世界:
2
  每个人都有一个"老大" (parent)
3
  最终的大佬没有老大 → 他就是这一派的"根" (root)
4

5
  问: 张三和李四是不是同一帮的?
6
    → 分别找到各自的大佬 (Find)
7
    → 大佬相同 → 同一帮 ✅
8
    → 大佬不同 → 不同帮 ❌
9

10
  操作: 两帮合并 (Union)
11
    → 让一个帮的大佬认另一个帮的大佬当老大
12
    → 两帮变一帮
13

14
  这就是并查集!

问题的起源#

给定 n 个元素，初始时每个元素独立成一个集合。需要支持两种操作：

Union(x, y)：合并 x 和 y 所在的集合
Find(x)：找到 x 所在集合的代表元素（根）
Connected(x, y)：判断 x 和 y 是否在同一集合 → Find(x) == Find(y)

方案	Union	Find	说明
暴力数组	O(n)	O(1)	Quick Find（每次合并要遍历所有元素）
链表	O(1)*	O(n)	每次 Find 要走到链尾
并查集	≈ O(1)	≈ O(1)	路径压缩 + 按秩合并 → α(n)

等价类 / 连通分量#

并查集维护的本质是等价关系——满足自反性、对称性、传递性的关系：

1
例: 好友关系 (传递性: A和B是好友, B和C是好友 → A和C是同一圈子)
2

3
  初始: {A}, {B}, {C}, {D}, {E}
4

5
  Union(A, B) → {A, B}, {C}, {D}, {E}
6
  Union(C, D) → {A, B}, {C, D}, {E}
7
  Union(B, D) → {A, B, C, D}, {E}    ← A和C现在也连通了!
8

9
  Find(A) == Find(C) → true  (同一连通分量)
10
  Find(A) == Find(E) → false (不同连通分量)

9.2 结构图解#

并查集的树形结构#

1
并查集用"森林"表示——每棵树是一个集合, 根节点是代表元素
2

3
初始 (每个元素自成一棵树):
4
  0   1   2   3   4   5   6   7
5

6
Union(0,1): 1 的 parent = 0
7
  0   2   3   4   5   6   7
8
  |
9
  1
10

11
Union(2,3): 3 的 parent = 2
12
  0   2   4   5   6   7
13
  |   |
14
  1   3
15

16
Union(0,2): 2 的 parent = 0  (把 2 的树挂到 0 下面)
17
    0       4   5   6   7
18
   / \
19
  1   2
20
      |
21
      3
22

23
Union(4,5), Union(6,7), Union(4,6):
24
    0           4
25
   / \         / \
26
  1   2       5   6
27
      |           |
28
      3           7
29

30
现在: Find(3) → 走 3→2→0 → 返回 0
31
      Find(7) → 走 7→6→4 → 返回 4
32
      Connected(1, 3) → Find(1)==0, Find(3)==0 → true ✅
33
      Connected(1, 7) → Find(1)==0, Find(7)==4 → false ❌

底层存储 —— 一个数组就够了#

1
parent 数组: parent[i] = i 的父节点, 如果 parent[i] == i 则 i 是根
2

3
下标:     0  1  2  3  4  5  6  7
4
parent:  [0, 0, 0, 2, 4, 4, 4, 6]
5

6
树形结构:     0           4
7
             / \         / \
8
            1   2       5   6
9
                |           |
10
                3           7
11

12
Find(3): parent[3]=2 → parent[2]=0 → parent[0]=0 (根!) → 返回 0

9.3 C++ 底层实现#

9.3.1 Quick Find (朴素版 —— 了解即可)#

1
class QuickFind {
2
    std::vector<int> _id;  // _id[i] = i 所在集合的标识
3

4
public:
5
    QuickFind(int n) : _id(n) {
6
        std::iota(_id.begin(), _id.end(), 0);
7
    }
8

9
    int find(int x) const { return _id[x]; }  // O(1)!
10

11
    void unite(int x, int y) {
12
        int id_x = _id[x], id_y = _id[y];
13
        if (id_x == id_y) return;
14

15
        // 把所有 id_y 的元素改成 id_x → O(n)!
16
        for (int& id : _id) {
17
            if (id == id_y) id = id_x;
18
        }
19
    }
20

21
    bool connected(int x, int y) const { return _id[x] == _id[y]; }
22
};

Quick Find 的问题：Find O(1) 很快，但 Union O(n) 太慢——每次合并要遍历整个数组。如果有 n 次 Union 操作，总时间 O(n²)。

9.3.2 Quick Union (树形结构)#

1
class QuickUnion {
2
    std::vector<int> _parent;
3

4
public:
5
    QuickUnion(int n) : _parent(n) {
6
        std::iota(_parent.begin(), _parent.end(), 0);  // parent[i] = i
7
    }
8

9
    int find(int x) {
10
        while (_parent[x] != x) {
11
            x = _parent[x];  // 沿着 parent 走到根
12
        }
13
        return x;
14
    }
15

16
    void unite(int x, int y) {
17
        int rx = find(x), ry = find(y);
18
        if (rx != ry) {
19
            _parent[ry] = rx;  // 把 y 的根挂到 x 的根下面
20
        }
21
    }
22

23
    bool connected(int x, int y) { return find(x) == find(y); }
24
};

Quick Union 的问题：Find 的时间取决于树的高度。最坏情况退化成链表 → Find O(n)。

1
退化示例:
2
  Union(1,0), Union(2,1), Union(3,2), Union(4,3)
3

4
  0 ← 1 ← 2 ← 3 ← 4    (链表!)
5

6
  Find(4): 4→3→2→1→0 → O(n)

9.3.3 路径压缩 (Path Compression)#

核心优化：Find 的时候，把沿途所有节点直接指向根。下次 Find 这些节点时就是 O(1) 了。

1
路径压缩前:                   路径压缩后:
2
      0                            0
3
      |                         / | \ \
4
      1                        1  2  3  4
5
      |
6
      2
7
      |
8
      3
9
      |
10
      4
11

12
Find(4): 4→3→2→1→0            Find(4): 4→0 (一步到位!)
13
同时: parent[4]=0, parent[3]=0, parent[2]=0, parent[1]=0

1
// 路径压缩 —— 递归版 (最简洁)
2
int find(int x) {
3
    if (_parent[x] != x) {
4
        _parent[x] = find(_parent[x]);  // 递归 + 直接挂到根
5
    }
6
    return _parent[x];
7
}
8

9
// 路径压缩 —— 迭代版 (防栈溢出)
10
int find_iterative(int x) {
11
    int root = x;
12
    while (_parent[root] != root) {
13
        root = _parent[root];
14
    }
15
    // 第二遍: 把路径上所有节点直接指向根
16
    while (_parent[x] != root) {
17
        int next = _parent[x];
18
        _parent[x] = root;
19
        x = next;
20
    }
21
    return root;
22
}
23

24
// 路径压缩的"兄弟版" —— 路径分裂 (Path Splitting)
25
// 每个节点指向祖父节点 (不需要两遍, 效果接近完全压缩)
26
int find_splitting(int x) {
27
    while (_parent[x] != x) {
28
        _parent[x] = _parent[_parent[x]];  // 指向祖父
29
        x = _parent[x];
30
    }
31
    return x;
32
}

9.3.4 按秩合并 (Union by Rank)#

核心优化：Union 时，把矮的树挂到高的树下面。这保证树的高度增长缓慢——最多 O(log n)。

1
按秩合并:
2
  rank 代表树的"大致高度"
3

4
  合并时: 如果两棵树高度不同, 矮树挂到高树下 → 总高度不变
5
          如果两棵树高度相同, 任选一个挂 → 总高度 +1
6

7
  结果: 树高度最多 O(log n), 不会退化成链表

9.3.5 终极实现 —— 路径压缩 + 按秩合并#

1
#include <vector>
2
#include <numeric>
3

4
class UnionFind {
5
    std::vector<int> _parent;
6
    std::vector<int> _rank;
7
    int _count;  // 连通分量数
8

9
public:
10
    UnionFind(int n) : _parent(n), _rank(n, 0), _count(n) {
11
        std::iota(_parent.begin(), _parent.end(), 0);
12
    }
13

14
    // ========== Find (路径压缩) ==========
15
    int find(int x) {
16
        if (_parent[x] != x) {
17
            _parent[x] = find(_parent[x]);  // 路径压缩
18
        }
19
        return _parent[x];
20
    }
21

22
    // ========== Union (按秩合并) ==========
23
    bool unite(int x, int y) {
24
        int rx = find(x), ry = find(y);
25
        if (rx == ry) return false;  // 已在同一集合
26

27
        // 按秩合并: 矮树挂到高树下
28
        if (_rank[rx] < _rank[ry]) std::swap(rx, ry);
29
        _parent[ry] = rx;
30

31
        // 如果高度相同, 合并后高度 +1
32
        if (_rank[rx] == _rank[ry]) ++_rank[rx];
33

34
        --_count;  // 连通分量减少
35
        return true;
36
    }
37

38
    // ========== 查询 ==========
39
    bool connected(int x, int y) { return find(x) == find(y); }
40
    int count() const { return _count; }  // 当前连通分量数
41
};

使用示例：

1
UnionFind uf(10);
2

3
uf.unite(0, 1);
4
uf.unite(2, 3);
5
uf.unite(0, 3);     // {0,1,2,3} 合并为一个集合
6

7
uf.connected(0, 3); // true
8
uf.connected(0, 5); // false
9
uf.count();          // 7 (10 - 3 次成功合并)

9.3.6 按大小合并 (Union by Size) —— 替代方案#

有时用”集合大小”代替”秩”更方便，因为 size 有实际物理意义：

1
class UnionFindBySize {
2
    std::vector<int> _parent;
3
    std::vector<int> _size;
4
    int _count;
5

6
public:
7
    UnionFindBySize(int n) : _parent(n), _size(n, 1), _count(n) {
8
        std::iota(_parent.begin(), _parent.end(), 0);
9
    }
10

11
    int find(int x) {
12
        if (_parent[x] != x) _parent[x] = find(_parent[x]);
13
        return _parent[x];
14
    }
15

16
    bool unite(int x, int y) {
17
        int rx = find(x), ry = find(y);
18
        if (rx == ry) return false;
19

20
        // 小集合挂到大集合下
21
        if (_size[rx] < _size[ry]) std::swap(rx, ry);
22
        _parent[ry] = rx;
23
        _size[rx] += _size[ry];  // 更新大小
24

25
        --_count;
26
        return true;
27
    }
28

29
    int size_of(int x) { return _size[find(x)]; }
30
    bool connected(int x, int y) { return find(x) == find(y); }
31
    int count() const { return _count; }
32
};

9.4 复杂度分析#

α(n) —— 反阿克曼函数#

路径压缩 + 按秩合并后，单次操作的均摊时间是 O(α(n))，其中 α 是反阿克曼函数 (Inverse Ackermann Function)。

1
α(n) 增长极其缓慢:
2
  α(1)         = 0
3
  α(2)         = 1
4
  α(4)         = 2
5
  α(16)        = 3
6
  α(65536)     = 4
7
  α(2^65536)   = 5  ← 这个数比宇宙中的原子数还多!
8

9
对于所有实际可能的 n (n < 2^65536):
10
  α(n) ≤ 5
11

12
因此 α(n) 在工程上等价于常数 → 并查集的操作几乎是 O(1)

复杂度速查#

实现	Find	Union	说明
Quick Find	O(1)	O(n)	每次合并遍历数组
Quick Union	O(n)	O(n)	树可能退化成链
+ 按秩合并	O(log n)	O(log n)	树高 ≤ log n
+ 路径压缩	O(α(n))≈O(1)	O(α(n))≈O(1)	摊还分析
终极版 (两者结合)	O(α(n))	O(α(n))	实际 O(1)

操作	时间	空间
初始化	O(n)	O(n)
Find	O(α(n)) ≈ O(1)	O(1)（迭代）/ O(α(n))（递归栈）
Unite	O(α(n)) ≈ O(1)	O(1)
Connected	O(α(n)) ≈ O(1)	O(1)
m 次操作总计	O(m × α(n)) ≈ O(m)	O(n)

💡 面试表述：「并查集的 Find 和 Union 操作时间复杂度是 O(α(n))，α 是反阿克曼函数，对所有实际输入都 ≤ 5，可以视为常数时间。」

9.5 进阶变体#

9.5.1 带权并查集 (Weighted Union-Find)#

在普通并查集的基础上，每条 parent 边带一个权重，表示节点到父节点的某种”距离”或”倍率关系”。

核心难点：路径压缩时，权重不能丢——必须沿路径累乘。

1
应用: LC 399 除法求值
2
  a / b = 2.0  →  a = 2 × b  →  weight(a→b) = 2.0
3
  b / c = 3.0  →  b = 3 × c  →  weight(b→c) = 3.0
4

5
  求 a / c?
6
    a → b → c
7
    a = 2 × b = 2 × 3 × c = 6 × c
8
    a / c = 6.0
9

10
  路径压缩时, 权重要累积!

图解：带权路径压缩#

1
初始：unite(a, b, 2.0) → a/b=2, 即 weight(a→b)=2.0
2

3
    b (root)
4
    ↑ w=2.0
5
    a
6

7
再 unite(b, c, 3.0) → b/c=3, 即 weight(b→c)=3.0
8

9
    c (root)
10
    ↑ w=3.0
11
    b
12
    ↑ w=2.0
13
    a
14

15
现在 find(a) 触发路径压缩:
16
  Step 1: 递归 find(b)
17
    Step 1.1: 递归 find(c)
18
      c 是根, 返回 c
19
    Step 1.2: 回到 b
20
      weight[b] *= weight[parent[b]] → weight[b] = 3.0 × 1.0 = 3.0
21
      parent[b] = c  (已经是 c, 不变)
22
  Step 2: 回到 a
23
    weight[a] *= weight[parent[a]] → weight[a] = 2.0 × 3.0 = 6.0
24
    parent[a] = c  (直接指向根!)
25

26
压缩后:
27
    c (root)
28
   ↗ ↑
29
  a   b
30
  w=6 w=3
31

32
query(a, c) = weight[a] / weight[c] = 6.0 / 1.0 = 6.0 ✅
33
query(a, b) = weight[a] / weight[b] = 6.0 / 3.0 = 2.0 ✅

带权合并的权重推导#

1
已有:
2
  a --w[a]--> root_x    (a 到 root_x 的累积权重 = w[a])
3
  b --w[b]--> root_y    (b 到 root_y 的累积权重 = w[b])
4

5
现在 unite(a, b, val), 即 a/b = val:
6
  需要让 root_x 和 root_y 合并
7

8
  假设 root_y 挂到 root_x 下:
9
    weight[root_y] = ?
10

11
  推导:
12
    a = w[a] × root_x
13
    b = w[b] × root_y
14
    a/b = val  →  w[a] × root_x / (w[b] × root_y) = val
15
    →  root_y = w[a] × root_x / (val × w[b])
16
    →  weight[root_y → root_x] = w[a] / (val × w[b])

C++ 实现#

1
class WeightedUnionFind {
2
    std::vector<int> _parent;
3
    std::vector<double> _weight;  // weight[i] = i 到 parent[i] 的权重比
4
    std::vector<int> _rank;
5

6
public:
7
    WeightedUnionFind(int n)
8
        : _parent(n), _weight(n, 1.0), _rank(n, 0)
9
    {
10
        std::iota(_parent.begin(), _parent.end(), 0);
11
    }
12

13
    // 带权路径压缩
14
    int find(int x) {
15
        if (_parent[x] != x) {
16
            int root = find(_parent[x]);
17
            _weight[x] *= _weight[_parent[x]];  // 权重累积!
18
            _parent[x] = root;
19
        }
20
        return _parent[x];
21
    }
22

23
    // 带权合并: x / y = value → x = value × y
24
    void unite(int x, int y, double value) {
25
        int rx = find(x), ry = find(y);
26
        if (rx == ry) return;
27

28
        if (_rank[rx] < _rank[ry]) {
29
            _parent[rx] = ry;
30
            _weight[rx] = value * _weight[y] / _weight[x];
31
        } else {
32
            _parent[ry] = rx;
33
            _weight[ry] = _weight[x] / (value * _weight[y]);
34
            if (_rank[rx] == _rank[ry]) ++_rank[rx];
35
        }
36
    }
37

38
    // 查询 x / y 的值; 如果不在同一集合返回 -1.0
39
    double query(int x, int y) {
40
        if (find(x) != find(y)) return -1.0;
41
        return _weight[x] / _weight[y];
42
    }
43
};

💡 带权并查集的本质：普通并查集维护的是”谁和谁在同一组”的布尔关系；带权并查集维护的是”谁是谁的多少倍”的定量关系。Find 返回的 weight[x] 代表 x 到根的累积倍率，任意两个同组节点 x、y 之间的关系 = weight[x] / weight[y]。

带权并查集的另一类应用：种类并查集#

1
问题: 判断一群动物的敌友关系是否矛盾 (LC 399 的变体)
2
  - "A 和 B 是朋友" (同类)
3
  - "A 和 B 是敌人" (异类)
4

5
方法: weight = 0 表示"同类", weight = 1 表示"异类"
6
  合并时: 关系用 XOR (异或) 传递
7

8
  A-B 朋友, B-C 朋友 → A-C? XOR(0,0) = 0 → 朋友 ✅
9
  A-B 朋友, B-C 敌人 → A-C? XOR(0,1) = 1 → 敌人 ✅
10
  A-B 敌人, B-C 敌人 → A-C? XOR(1,1) = 0 → 朋友 ✅ (敌人的敌人是朋友)
11

12
这种思路在 LC 886 (可能的二分法) 中也适用。

9.5.2 可持久化并查集 (概述)#

1
普通并查集只支持"合并"，不支持"撤销"。
2
可持久化并查集通过保存历史版本实现:
3
  - 版本 0: {A}, {B}, {C}, {D}
4
  - 版本 1: Union(A,B) → {A,B}, {C}, {D}
5
  - 版本 2: Union(C,D) → {A,B}, {C,D}
6
  - 回退到版本 1: {A,B}, {C}, {D}
7

8
实现: 可持久化数组 (线段树或 Treap)
9
面试: 极少考，竞赛进阶

9.5.3 并查集 + 额外信息#

面试中常见的增强模式：

1
// 维护每个集合的额外信息
2
class EnhancedUnionFind {
3
    std::vector<int> _parent, _rank;
4
    std::vector<int> _size;      // 集合大小
5
    std::vector<int> _min_val;   // 集合中最小值
6
    std::vector<int> _max_val;   // 集合中最大值
7

8
public:
9
    EnhancedUnionFind(int n, const std::vector<int>& values)
10
        : _parent(n), _rank(n, 0), _size(n, 1),
11
          _min_val(values), _max_val(values)
12
    {
13
        std::iota(_parent.begin(), _parent.end(), 0);
14
    }
15

16
    int find(int x) {
17
        if (_parent[x] != x) _parent[x] = find(_parent[x]);
18
        return _parent[x];
19
    }
20

21
    void unite(int x, int y) {
22
        int rx = find(x), ry = find(y);
23
        if (rx == ry) return;
24

25
        if (_rank[rx] < _rank[ry]) std::swap(rx, ry);
26
        _parent[ry] = rx;
27

28
        // 合并额外信息
29
        _size[rx] += _size[ry];
30
        _min_val[rx] = std::min(_min_val[rx], _min_val[ry]);
31
        _max_val[rx] = std::max(_max_val[rx], _max_val[ry]);
32

33
        if (_rank[rx] == _rank[ry]) ++_rank[rx];
34
    }
35

36
    int get_size(int x) { return _size[find(x)]; }
37
    int get_min(int x) { return _min_val[find(x)]; }
38
    int get_max(int x) { return _max_val[find(x)]; }
39
};

9.6 面试高频题#

连通分量个数 (LeetCode 323)#

给定 n 个节点和边列表，求无向图中连通分量的个数。

并查集模板题——Union 所有边，最终 count 就是答案：

1
int countComponents(int n, std::vector<std::vector<int>>& edges) {
2
    UnionFind uf(n);
3

4
    for (auto& e : edges) {
5
        uf.unite(e[0], e[1]);
6
    }
7

8
    return uf.count();
9
}
10
// 时间 O(E × α(V)), 空间 O(V)

💡 DFS/BFS 也能求连通分量（第 7.1 节已讲）。但并查集的优势是动态——可以一边加边一边查询连通性，不需要一开始就知道所有边。

冗余连接 (LeetCode 684)#

给定 n 条边的无向图（本应是树，多了一条边），找出多余的那条边。

第 7.3 节已用并查集解过，这里再给出完整代码：

1
std::vector<int> findRedundantConnection(std::vector<std::vector<int>>& edges) {
2
    int n = edges.size();
3
    UnionFind uf(n + 1);  // 节点 1~n
4

5
    for (auto& e : edges) {
6
        if (!uf.unite(e[0], e[1])) {
7
            return e;  // 两端点已连通 → 多余边!
8
        }
9
    }
10

11
    return {};
12
}
13
// 时间 O(n × α(n)), 空间 O(n)

💡 并查集判环的本质：加边 (u,v) 时，如果 Find(u) == Find(v)，说明 u 和 v 已经在同一连通分量中——再加这条边就形成环了。这和 Kruskal MST 的逻辑完全相同。

账户合并 (LeetCode 721)#

给定多个账户（每个账户有一个姓名和若干邮箱），相同邮箱的账户属于同一人。合并同一人的所有邮箱。

思路：每个邮箱是一个节点，同一账户内的邮箱互相 Union。最后按根节点分组：

1
std::vector<std::vector<std::string>> accountsMerge(
2
    std::vector<std::vector<std::string>>& accounts)
3
{
4
    // 邮箱 → 编号
5
    std::unordered_map<std::string, int> email_id;
6
    std::unordered_map<std::string, std::string> email_name;
7
    int id = 0;
8

9
    for (auto& acc : accounts) {
10
        const std::string& name = acc[0];
11
        for (int i = 1; i < static_cast<int>(acc.size()); ++i) {
12
            if (email_id.find(acc[i]) == email_id.end()) {
13
                email_id[acc[i]] = id++;
14
            }
15
            email_name[acc[i]] = name;
16
        }
17
    }
18

19
    // 并查集: 同一账户内的邮箱 Union
20
    UnionFind uf(id);
21
    for (auto& acc : accounts) {
22
        int first = email_id[acc[1]];
23
        for (int i = 2; i < static_cast<int>(acc.size()); ++i) {
24
            uf.unite(first, email_id[acc[i]]);
25
        }
26
    }
27

28
    // 按根节点分组
29
    std::unordered_map<int, std::set<std::string>> groups;
30
    for (auto& [email, eid] : email_id) {
31
        groups[uf.find(eid)].insert(email);
32
    }
33

34
    // 组装结果
35
    std::vector<std::vector<std::string>> result;
36
    for (auto& [root, emails] : groups) {
37
        std::vector<std::string> merged;
38
        // 找这组任意一个邮箱的姓名
39
        merged.push_back(email_name[*emails.begin()]);
40
        for (auto& e : emails) {
41
            merged.push_back(e);
42
        }
43
        result.push_back(std::move(merged));
44
    }
45

46
    return result;
47
}
48
// 时间 O(n × m × α(n×m)), 空间 O(n × m)
49
// n = 账户数, m = 平均每个账户的邮箱数

💡 此题的关键：同一账户内的邮箱 Union；不同账户如果有相同邮箱，通过 email_id 映射到同一个节点，自动 Union。最终按根分组输出。

等式方程的可满足性 (LeetCode 990)#

给定一组等式和不等式（如 ["a==b","b!=c","c==a"]），判断是否存在矛盾。

1
bool equationsPossible(std::vector<std::string>& equations) {
2
    UnionFind uf(26);  // 26 个字母
3

4
    // 第一遍: 处理所有 == (合并)
5
    for (auto& eq : equations) {
6
        if (eq[1] == '=') {
7
            uf.unite(eq[0] - 'a', eq[3] - 'a');
8
        }
9
    }
10

11
    // 第二遍: 检查所有 != (是否矛盾)
12
    for (auto& eq : equations) {
13
        if (eq[1] == '!') {
14
            if (uf.connected(eq[0] - 'a', eq[3] - 'a')) {
15
                return false;  // a==b 但要求 a!=b → 矛盾!
16
            }
17
        }
18
    }
19

20
    return true;
21
}
22
// 时间 O(n × α(26)) ≈ O(n), 空间 O(1)

💡 两遍扫描法：先处理”相等”关系（Union），再检查”不等”关系是否与已建立的等价类矛盾。顺序不能反——必须先知道哪些变量等价，才能判断不等式是否合理。

最长连续序列 (LeetCode 128)#

给定无序数组，找出最长连续序列的长度。要求 O(n) 时间。

并查集解法：把每个相邻数字 (x 和 x+1) Union，最后找最大的集合：

1
int longestConsecutive(std::vector<int>& nums) {
2
    std::unordered_map<int, int> num_to_id;
3
    int n = nums.size();
4
    if (n == 0) return 0;
5

6
    // 去重并分配 ID
7
    int id = 0;
8
    for (int num : nums) {
9
        if (num_to_id.find(num) == num_to_id.end()) {
10
            num_to_id[num] = id++;
11
        }
12
    }
13

14
    UnionFindBySize uf(id);
15

16
    // 对每个数, 如果 num+1 存在, Union 它们
17
    for (auto& [num, nid] : num_to_id) {
18
        if (num_to_id.count(num + 1)) {
19
            uf.unite(nid, num_to_id[num + 1]);
20
        }
21
    }
22

23
    // 找最大集合
24
    int max_size = 1;
25
    for (auto& [num, nid] : num_to_id) {
26
        max_size = std::max(max_size, uf.size_of(nid));
27
    }
28

29
    return max_size;
30
}
31
// 时间 O(n), 空间 O(n)

💡 这道题更常见的解法是哈希集合 + 找序列起点，但并查集解法体现了”连续 = 连通”的巧妙映射。面试中如果刚讲完并查集，用这道题展示应用很加分。

岛屿数量 —— 并查集版 (LeetCode 200)#

同 7.1 的岛屿数量，但用并查集而非 DFS 解。

1
int numIslands(std::vector<std::vector<char>>& grid) {
2
    int m = grid.size(), n = grid[0].size();
3
    UnionFind uf(m * n);
4
    int water = 0;
5

6
    for (int i = 0; i < m; ++i) {
7
        for (int j = 0; j < n; ++j) {
8
            if (grid[i][j] == '0') {
9
                ++water;
10
                continue;
11
            }
12

13
            // 只向右和向下合并 (避免重复)
14
            if (i + 1 < m && grid[i + 1][j] == '1') {
15
                uf.unite(i * n + j, (i + 1) * n + j);
16
            }
17
            if (j + 1 < n && grid[i][j + 1] == '1') {
18
                uf.unite(i * n + j, i * n + j + 1);
19
            }
20
        }
21
    }
22

23
    return uf.count() - water;  // 总分量 - 水的个数 = 岛屿数
24
}
25
// 时间 O(m×n × α(m×n)), 空间 O(m×n)

除法求值 (LeetCode 399)#

给定一些除法等式（如 a/b=2, b/c=3），求其他除法结果（如 a/c=?）。

带权并查集的经典应用：

1
std::vector<double> calcEquation(
2
    std::vector<std::vector<std::string>>& equations,
3
    std::vector<double>& values,
4
    std::vector<std::vector<std::string>>& queries)
5
{
6
    // 字符串 → 编号
7
    std::unordered_map<std::string, int> var_id;
8
    int id = 0;
9
    for (auto& eq : equations) {
10
        if (!var_id.count(eq[0])) var_id[eq[0]] = id++;
11
        if (!var_id.count(eq[1])) var_id[eq[1]] = id++;
12
    }
13

14
    WeightedUnionFind uf(id);
15

16
    // 建立关系: eq[0] / eq[1] = values[i]
17
    for (int i = 0; i < static_cast<int>(equations.size()); ++i) {
18
        uf.unite(var_id[equations[i][0]], var_id[equations[i][1]], values[i]);
19
    }
20

21
    // 查询
22
    std::vector<double> result;
23
    for (auto& q : queries) {
24
        if (!var_id.count(q[0]) || !var_id.count(q[1])) {
25
            result.push_back(-1.0);
26
        } else {
27
            result.push_back(uf.query(var_id[q[0]], var_id[q[1]]));
28
        }
29
    }
30

31
    return result;
32
}

冗余连接 II —— 有向图版 (LeetCode 685)#

给定一个有根树（有向），多了一条边导致不再是树。找出多余的那条边。注意：如果有多个答案，返回输入中最后出现的那条。

🧠 思路推导（面试时怎么想到的）：

1
Step 1: 与 LC 684 的区别
2
  LC 684 是无向图, 直接并查集判环即可。
3
  LC 685 是有向图, 多了一条边可能造成:
4
    Case A: 某个节点有 2 个父节点 (入度=2), 无环
5
    Case B: 某个节点有 2 个父节点 (入度=2), 且形成环
6
    Case C: 没有入度=2 的节点, 但有环 (环上某条是多余的)
7

8
Step 2: 观察
9
  树的性质: 除根外每个节点恰好 1 个父节点 (入度=1)
10
  多一条边 → 要么某节点入度变 2, 要么形成环
11

12
Step 3: 方案
13
  1. 先扫描所有边, 找入度=2 的节点 (如果有)
14
     → 记录指向它的两条边: candidate1 (先出现) 和 candidate2 (后出现)
15
  2. 先尝试删 candidate2 (题目要求返回最后出现的)
16
     → 用并查集加剩余边, 如果无环 → candidate2 就是答案
17
     → 如果有环 → 说明 candidate2 不是多余的, candidate1 是答案
18
  3. 如果没有入度=2 的节点 → Case C
19
     → 直接并查集判环 (和 LC 684 一样)

1
std::vector<int> findRedundantDirectedConnection(
2
    std::vector<std::vector<int>>& edges)
3
{
4
    int n = edges.size();
5
    std::vector<int> parent(n + 1, 0);  // parent[i] = i 的父节点
6

7
    // Step 1: 找入度=2 的节点
8
    std::vector<int> cand1, cand2;
9
    for (auto& e : edges) {
10
        if (parent[e[1]] == 0) {
11
            parent[e[1]] = e[0];  // 记录父节点
12
        } else {
13
            // e[1] 已有父节点 → 入度=2!
14
            cand1 = {parent[e[1]], e[1]};  // 先出现的边
15
            cand2 = e;                      // 后出现的边
16
        }
17
    }
18

19
    // 重建并查集
20
    std::iota(parent.begin(), parent.end(), 0);
21

22
    auto find = [&](int x) {
23
        while (parent[x] != x) {
24
            parent[x] = parent[parent[x]];  // 路径压缩
25
            x = parent[x];
26
        }
27
        return x;
28
    };
29

30
    for (auto& e : edges) {
31
        if (e == cand2) continue;  // 先尝试删 cand2
32

33
        int rx = find(e[0]), ry = find(e[1]);
34
        if (rx == ry) {
35
            // 有环!
36
            // 如果有 cand1 → cand2 不是问题, cand1 才是
37
            // 如果没有 cand1 → 这条边是多余的 (Case C)
38
            return cand1.empty() ? e : cand1;
39
        }
40
        parent[ry] = rx;
41
    }
42

43
    // 删 cand2 后无环 → cand2 就是多余的
44
    return cand2;
45
}
46
// 时间 O(n × α(n)), 空间 O(n)

💡 面试价值：LC 685 是并查集类题目中最难的之一，面试官通过这道题考察你的分类讨论能力——能不能识别出三种 case 并统一处理。核心 insight：有向图的多余边要么破坏了”每个非根节点恰好一个父节点”的性质（入度 2），要么引入了环。

9.7 🎮 游戏实战场景#

地图连通性验证#

1
// Roguelike 地图生成后, 需要验证所有房间是否连通
2
// 如果不连通, 要添加额外走廊
3

4
class MapConnectivityChecker {
5
    int _rows, _cols;
6
    UnionFind _uf;
7

8
    int _encode(int x, int y) const { return x * _cols + y; }
9

10
public:
11
    MapConnectivityChecker(int rows, int cols)
12
        : _rows(rows), _cols(cols), _uf(rows * cols) {}
13

14
    // 连接两个相邻的可通行格子
15
    void connect(int x1, int y1, int x2, int y2) {
16
        _uf.unite(_encode(x1, y1), _encode(x2, y2));
17
    }
18

19
    // 扫描地图, 合并所有相邻的可通行格子
20
    void scan_map(const std::vector<std::vector<int>>& map) {
21
        for (int i = 0; i < _rows; ++i) {
22
            for (int j = 0; j < _cols; ++j) {
23
                if (map[i][j] == 0) continue;  // 墙壁
24

25
                if (i + 1 < _rows && map[i + 1][j])
26
                    _uf.unite(_encode(i, j), _encode(i + 1, j));
27
                if (j + 1 < _cols && map[i][j + 1])
28
                    _uf.unite(_encode(i, j), _encode(i, j + 1));
29
            }
30
        }
31
    }
32

33
    // 检查两个位置是否连通
34
    bool is_reachable(int x1, int y1, int x2, int y2) {
35
        return _uf.connected(_encode(x1, y1), _encode(x2, y2));
36
    }
37

38
    // 获取所有不同的连通区域 → 如果 > 1, 需要额外走廊
39
    int region_count(const std::vector<std::vector<int>>& map) {
40
        std::unordered_set<int> roots;
41
        for (int i = 0; i < _rows; ++i) {
42
            for (int j = 0; j < _cols; ++j) {
43
                if (map[i][j]) {
44
                    roots.insert(_uf.find(_encode(i, j)));
45
                }
46
            }
47
        }
48
        return roots.size();
49
    }
50
};

队伍 / 公会合并#

1
// MMO 游戏中, 玩家可以组队或加入公会
2
// 两个公会合并 → Union
3
// 查询两个玩家是否同一公会 → Connected
4

5
class GuildSystem {
6
    UnionFindBySize _uf;
7
    std::unordered_map<int, std::string> _guild_names;
8

9
public:
10
    GuildSystem(int max_players) : _uf(max_players) {}
11

12
    // 创建公会 (leader 成为代表)
13
    void create_guild(int leader, const std::string& name) {
14
        _guild_names[leader] = name;
15
    }
16

17
    // 玩家加入公会 (Union 到公会 leader)
18
    void join_guild(int player, int guild_leader) {
19
        _uf.unite(player, guild_leader);
20
    }
21

22
    // 合并两个公会
23
    void merge_guilds(int leader_a, int leader_b) {
24
        _uf.unite(leader_a, leader_b);
25
    }
26

27
    // 查询两个玩家是否同一公会
28
    bool same_guild(int a, int b) {
29
        return _uf.connected(a, b);
30
    }
31

32
    // 获取公会人数
33
    int guild_size(int player) {
34
        return _uf.size_of(player);
35
    }
36
};
37

38
// 使用:
39
// GuildSystem gs(10000);
40
// gs.create_guild(1, "风暴军团");
41
// gs.join_guild(2, 1);   // 玩家2 加入 玩家1 的公会
42
// gs.join_guild(3, 1);   // 玩家3 加入
43
// gs.same_guild(2, 3);   // true
44
// gs.guild_size(1);      // 3

动态地图生成 —— 房间连通验证#

1
// 第 7.3 节用 Kruskal MST 生成走廊, 核心就是并查集:
2
// 1. 随机放置房间
3
// 2. 所有房间对之间建边
4
// 3. Kruskal: 按距离排序, 逐条边 Union
5
//    → 如果两个房间已连通, 跳过 (避免环)
6
//    → 如果不连通, 添加走廊 + Union
7
// 4. 选够 n-1 条边 → 所有房间连通
8

9
// 并查集在这里的角色: O(α(n)) 判断两个房间是否已连通
10
// 没有并查集, Kruskal 需要 O(n) 的 DFS/BFS 来判断连通性, 效率大幅下降

网络拓扑连通检测#

1
// 多人游戏的 P2P 网络:
2
// 每个玩家是一个节点, 直连关系是边
3
// 如果某个玩家掉线, 需要检测网络是否断裂
4

5
// 方案一: 静态检测 (用并查集)
6
// 重建并查集 (去掉掉线玩家的所有连接), 检查 count > 1?
7

8
// 方案二: 动态监测 (更复杂, 但实时)
9
// 维护一个并查集, 掉线时需要"拆分"操作
10
// 标准并查集不支持拆分 → 可以用"反向并查集"
11
// (时间逆序: 从最终状态往回加边, 每加一条边做一次 Union)

游戏场景总结#

游戏系统	并查集用法	关键操作
地图连通性	格子=节点，相邻可通行=边	Union + count → 检测是否全连通
队伍/公会	玩家=节点，同队/同会=同集合	Union(join) + Connected(same_guild)
Kruskal 地图生成	房间=节点，走廊=边	Union + Connected → 判环/选边
网络拓扑	设备=节点，连接=边	Union → 检测连通
洪水填充	格子=节点，同色相邻=边	Union 同色邻居 → 计算区域大小

9.8 面试题速查表#

题号	题目	核心技巧	难度
LC 323	连通分量个数	并查集模板	Medium
LC 684	冗余连接	并查集判环	Medium
LC 721	账户合并	并查集 + 邮箱映射	Medium
LC 990	等式方程可满足性	并查集 + 两遍扫描	Medium
LC 128	最长连续序列	并查集 / 哈希集合	Medium
LC 200	岛屿数量	并查集 / DFS	Medium
LC 399	除法求值	带权并查集	Medium
LC 685	冗余连接 II (有向图)	并查集 + 入度分析	Hard
LC 547	省份数量	并查集 / DFS	Medium
LC 1319	连通网络的操作次数	并查集 + 多余边计数	Medium
LC 952	按公因数计算最大组件大小	并查集 + 质因数分解	Hard
LC 1202	交换字符串中的元素	并查集 + 组内排序	Medium

9.9 并查集在系列中的位置#

与其他章节的关联#

1
并查集是"胶水"数据结构:
2
  - Kruskal MST 的核心组件 (第 7.3 章)
3
  - 冗余连接 = MST 判环的特殊情况
4
  - 岛屿数量 = 连通分量计数 (DFS/BFS 的替代方案)
5
  - 社交网络好友圈 = 等价类划分
6
  - 等式方程 = 传递性等价关系
7

8
并查集不适合:
9
  - 需要删除元素 / 拆分集合 (标准并查集只支持合并)
10
  - 需要遍历集合内的所有元素 (并查集不维护集合成员列表)
11
  - 需要有序性 (并查集不提供排序)

9.10 本章小结#

核心要点#

概念	要点
并查集	维护动态连通性的数据结构，支持 Union 和 Find
路径压缩	Find 时把沿途节点直接挂到根 → 树变扁
按秩合并	Union 时矮树挂到高树下 → 树高 ≤ log n
α(n)	反阿克曼函数，实际 ≤ 5 → 近 O(1)
底层	一个 parent 数组 + 一个 rank/size 数组
核心能力	判断连通性、合并集合、计数连通分量
不能做	拆分集合、遍历集合成员、删除元素

面试 30 秒速答#

Q：并查集的原理？为什么接近 O(1)？
A：并查集用森林表示不相交集合——每棵树是一个集合，根节点是代表元素。Find 沿 parent 走到根，Union 把一个根挂到另一个根下。两个优化让它接近 O(1)：路径压缩让 Find 时沿途节点直接指向根（树变扁），按秩合并让矮树挂到高树下（树不会退化成链）。两者结合后单次操作 O(α(n))，α 是反阿克曼函数，对一切实际输入 ≤ 5。

Q：路径压缩和按秩合并分别有什么作用？只用一个行不行？
A：路径压缩加速 Find——把路径上的节点全指向根，后续访问 O(1)。按秩合并防止退化——保证树高 ≤ log n。只用路径压缩，单次 Find 均摊 O(log n)；只用按秩合并，单次 Find 最坏 O(log n)。两者结合才得到 O(α(n)) ≈ O(1)。面试中一般都写两个优化。

Q：并查集和 DFS/BFS 求连通分量有什么区别？
A：结果相同，但并查集支持动态加边——每加一条边只需 O(α(n))，不需要重新遍历整张图。DFS/BFS 是离线算法，需要先知道所有边才能一次性求连通分量。当边动态增加时（如 Kruskal 逐步选边），并查集远优于每次重跑 DFS。

Q：为什么 Kruskal 要用并查集？
A：Kruskal 按权重排序边后逐条选边。选边时需要判断”两端点是否已连通”——如果已连通再加就形成环。并查集 O(α(n)) 判断连通性，比每次 DFS/BFS O(V+E) 快得多。Kruskal 的总时间 = 排序 O(E log E) + E 次并查集操作 O(E × α(V))。

📖 上一章：第八章字典树：前缀的力量
📖 下一章：第十章数据结构选型指南 —— 9 大数据结构全局横评，游戏场景映射，面试结构化答题模板。

音乐

音乐

第九章 并查集：找老大

第九章 并查集 (Union-Find)#