迭代工具

Python 提供了一系列内置函数和 itertools 模块，用于高效地创建、组合和消费迭代器。这些工具遵循惰性求值原则，在处理大规模数据时既节省内存又保持代码简洁。

枚举与配对

enumerate

enumerate(iterable, start=0) 为可迭代对象添加计数器，返回 (index, value) 元组的迭代器。它是同时需要元素值和索引时的首选工具。

names = ["Alice", "Bob", "Charlie"]

# 基础用法
for i, name in enumerate(names):
    print(f"{i}: {name}")
# 0: Alice
# 1: Bob
# 2: Charlie

# 自定义起始索引
for rank, name in enumerate(names, start=1):
    print(f"第 {rank} 名：{name}")
# 第 1 名：Alice
# 第 2 名：Bob

enumerate 返回的是迭代器，可以直接解包或转换为字典：

# 转换为字典
index_map = dict(enumerate(names))
print(index_map)  # {0: 'Alice', 1: 'Bob', 2: 'Charlie'}

# 查找满足条件的索引
lines = ["import os", "", "def main():", "    pass"]
for i, line in enumerate(lines):
    if line.startswith("def "):
        print(f"函数定义在第 {i} 行")
        break

zip

zip(*iterables) 将多个可迭代对象并行聚合，返回元组迭代器，每个元组包含来自各输入迭代器的对应元素。当最短的输入耗尽时，zip 停止。

names = ["Alice", "Bob", "Charlie"]
scores = [85, 92, 78]

for name, score in zip(names, scores):
    print(f"{name}: {score}")
# Alice: 85
# Bob: 92
# Charlie: 78

zip 常用于同时遍历多个序列，或创建字典：

# 用两个列表创建字典
keys = ["a", "b", "c"]
values = [1, 2, 3]
print(dict(zip(keys, values)))  # {'a': 1, 'b': 2, 'c': 3}

# 处理不等长序列（zip 以最短为准）
a = [1, 2, 3, 4]
b = ["x", "y"]
print(list(zip(a, b)))  # [(1, 'x'), (2, 'y')]

Python 3.10+ 的 zip 增加了 strict=True 参数，要求所有输入序列长度相同，否则抛出 ValueError。这在需要确保数据对齐时非常有用。

# names 和 scores 长度必须一致
for name, score in zip(names, scores, strict=True):
    print(f"{name}: {score}")

变换与过滤

map

map(function, iterable) 将函数应用于可迭代对象的每个元素，返回结果的迭代器。当函数为 None 时，map 的行为类似 zip（此用法已不推荐使用）。

# 将字符串列表转为整数
str_nums = ["1", "2", "3"]
ints = map(int, str_nums)
print(list(ints))  # [1, 2, 3]

# 配合 lambda
lengths = map(lambda s: len(s), ["apple", "banana", "cherry"])
print(list(lengths))  # [5, 6, 6]

# 多参数 map
import operator
a = [1, 2, 3]
b = [4, 5, 6]
sums = map(operator.add, a, b)
print(list(sums))  # [5, 7, 9]

map 是惰性求值的，适合处理大规模数据。对于简单场景，列表推导式通常更易读；但对于已有函数可直接应用的情况，map 更简洁。

filter

filter(function, iterable) 返回一个迭代器，只包含使函数返回真值的元素。如果 function 为 None，则过滤掉所有假值元素。

# 过滤偶数
nums = [1, 2, 3, 4, 5, 6]
evens = filter(lambda x: x % 2 == 0, nums)
print(list(evens))  # [2, 4, 6]

# 过滤假值
mixed = [0, 1, False, True, "", "hello", None, []]
truthy = filter(None, mixed)
print(list(truthy))  # [1, True, 'hello']

聚合与判断

any 与 all

any(iterable) 在任一元素为真时返回 True，否则返回 False。all(iterable) 在所有元素为真时返回 True，否则返回 False。两者都是短路求值：一旦结果确定就立即停止迭代。

# any：检查是否存在正数
nums = [-1, -2, 0, 3, -4]
print(any(x > 0 for x in nums))  # True，遇到 3 即停止

# all：检查是否全部及格
scores = [85, 92, 78, 88]
print(all(s >= 60 for s in scores))  # True

# 空序列的边界行为
print(any([]))  # False
print(all([]))  # True（空真，类似逻辑中的"全称量词对空集为真"）

sum、min、max

sum(iterable, start=0) 计算数值迭代器的总和。min() 和 max() 返回最小值和最大值，支持 key 参数指定比较函数。

data = [3, 1, 4, 1, 5, 9, 2, 6]

print(sum(data))           # 31
print(sum(x * x for x in data))  # 177

print(min(data))           # 1
print(max(data))           # 9

# 按 key 比较
words = ["apple", "banana", "cherry", "date"]
print(max(words, key=len))   # banana（最长）
print(min(words, key=len))   # date（最短）

# 带默认值的 min/max（处理空序列）
empty = []
print(min(empty, default=0))  # 0，不抛 ValueError

排序与反转

sorted 与 reversed

sorted(iterable, key=None, reverse=False) 返回排序后的新列表，不修改原可迭代对象。reversed() 返回反向迭代器。

nums = [3, 1, 4, 1, 5]

# sorted 返回列表
print(sorted(nums))              # [1, 1, 3, 4, 5]
print(sorted(nums, reverse=True))  # [5, 4, 3, 1, 1]

# 按自定义 key 排序
students = [("Bob", 85), ("Alice", 92), ("Charlie", 78)]
print(sorted(students, key=lambda x: x[1]))  # 按分数升序
print(sorted(students, key=lambda x: x[1], reverse=True))  # 按分数降序

# reversed 返回迭代器
for x in reversed(nums):
    print(x)  # 5, 1, 4, 1, 3

# 字符串反转
print("".join(reversed("hello")))  # olleh

注意 reversed() 要求可迭代对象支持 __reversed__() 或 __len__() 加 __getitem__()（即序列协议）。对于普通迭代器（如生成器），需要先转换为列表。

gen = (x for x in range(5))
# list(reversed(gen))  # TypeError！生成器不支持 reversed
print(list(reversed(list(gen))))  # 先转列表，再反转

itertools 模块

itertools 提供了专门用于迭代器操作的工具函数，涵盖无限序列、组合生成、分组和筛选等场景。

无限迭代器

import itertools

# count(start, step)：无限等差数列
for i in itertools.count(10, 2):
    if i > 20:
        break
    print(i)  # 10, 12, 14, 16, 18, 20

# cycle(iterable)：无限循环
for i, x in enumerate(itertools.cycle(["A", "B", "C"])):
    if i >= 5:
        break
    print(x)  # A, B, C, A, B

# repeat(elem, [times])：重复元素
print(list(itertools.repeat("x", 3)))  # ['x', 'x', 'x']

组合生成器

import itertools

items = ["a", "b", "c"]

# product：笛卡尔积
print(list(itertools.product(items, repeat=2)))
# [('a', 'a'), ('a', 'b'), ('a', 'c'), ('b', 'a'), ...]

# permutations：排列（有序）
print(list(itertools.permutations(items, 2)))
# [('a', 'b'), ('a', 'c'), ('b', 'a'), ('b', 'c'), ...]

# combinations：组合（无序）
print(list(itertools.combinations(items, 2)))
# [('a', 'b'), ('a', 'c'), ('b', 'c')]

# combinations_with_replacement：可重复组合
print(list(itertools.combinations_with_replacement(items, 2)))
# [('a', 'a'), ('a', 'b'), ('a', 'c'), ('b', 'b'), ...]

实用工具

import itertools

# chain：串联多个可迭代对象
a = [1, 2]
b = [3, 4]
print(list(itertools.chain(a, b)))  # [1, 2, 3, 4]

# islice：惰性切片（不复制数据）
big = range(1_000_000)
slice_gen = itertools.islice(big, 100, 105)
print(list(slice_gen))  # [100, 101, 102, 103, 104]

# zip_longest：不等长序列对齐，用 fillvalue 填充
names = ["Alice", "Bob"]
scores = [85, 92, 78]
for name, score in itertools.zip_longest(names, scores, fillvalue="N/A"):
    print(f"{name}: {score}")
# Alice: 85
# Bob: 92
# N/A: 78

# groupby：按 key 函数分组（要求输入已按 key 排序）
data = [("A", 1), ("A", 2), ("B", 3), ("B", 4), ("A", 5)]
# 错误！groupby 要求相同 key 的元素连续
# 正确做法：先排序
sorted_data = sorted(data, key=lambda x: x[0])
for key, group in itertools.groupby(sorted_data, key=lambda x: x[0]):
    print(f"组 {key}: {list(group)}")
# 组 A: [('A', 1), ('A', 2), ('A', 5)]
# 组 B: [('B', 3), ('B', 4)]

工具选择速查

需求	工具	返回值类型
需要索引	`enumerate()`	迭代器
多序列并行	`zip()`	迭代器
应用函数	`map()`	迭代器
条件过滤	`filter()`	迭代器
存在性判断	`any()` / `all()`	`bool`
求和/极值	`sum()` / `min()` / `max()`	数值
排序	`sorted()`	`list`
反转	`reversed()`	迭代器
无限计数	`itertools.count()`	迭代器
串联序列	`itertools.chain()`	迭代器
惰性切片	`itertools.islice()`	迭代器
排列组合	`itertools.permutations()` / `combinations()`	迭代器
不等长 zip	`itertools.zip_longest()`	迭代器
分组	`itertools.groupby()`	迭代器