字符串 str

Python 的 str 类型表示不可变的文本序列。字符串在 Python 中无处不在——从用户输入到文件内容，从网络响应到日志输出，几乎所有程序都需要处理文本。理解字符串的创建方式、内部机制和操作方法，是编写可靠代码的基础。

创建字符串

字符串可以用单引号、双引号或三引号包裹，三种方式在语义上完全等价，选择取决于内容中是否包含引号字符：

# 单引号
name = 'Python'

# 双引号，适合包含单引号的文本
sentence = "It's a beautiful day"

# 三引号，适合多行文本
doc = """
项目名称：数据分析平台
版本：1.0
作者：开发团队
"""

三引号字符串会保留输入时的换行符。如果希望首行不换行，可以在开头反斜杠转义换行：

header = """\
Usage: program [OPTIONS]
    -h        显示帮助
    -v        显示版本
"""

空字符串是合法的，在布尔上下文中被视为假值：

empty = ""
print(len(empty))   # 0
print(bool(empty))  # False

转义序列

反斜杠 \ 用于引入特殊字符：

text = "第一行\n第二行"    # \n 换行
table = "姓名\t年龄\t城市"  # \t 制表符
quote = "他说：\"你好\""    # \" 双引号
path = "C:\\Users\\Admin"   # \\ 反斜杠本身

如果字符串中包含大量反斜杠（如正则表达式或 Windows 路径），可以使用原始字符串前缀 r，此时反斜杠不再具有转义含义：

# 原始字符串
pattern = r"\d+\.\d+"      # 匹配浮点数正则
windows_path = r"C:\new_folder\test.txt"

# 对比：不加 r 时 \n 被解释为换行
wrong = "C:\new_folder\test.txt"
print(wrong)  # C:（换行）ew_folder    est.txt

原始字符串有一个限制：不能以奇数个反斜杠结尾，否则解析器无法判断引号是否被转义：

# 错误
# path = r"C:\\"  # SyntaxError

# 解决：拼接一个普通字符串结尾
path = r"C:\\" "\\"

字符串的不可变性

字符串一旦创建就不能修改。任何看似"修改"字符串的操作，实际上都是创建了一个新字符串：

s = "hello"
# s[0] = "H"      # TypeError: 'str' object does not support item assignment

# 正确做法：创建新字符串
s = "H" + s[1:]   # "Hello"

不可变性带来几个重要性质：字符串可以作为字典的键（因为哈希值稳定），在多线程环境中无需加锁，且可以被安全地共享引用。

索引与切片

字符串支持索引访问单个字符，以及切片获取子串。索引从 0 开始，负数索引从末尾倒数：

word = "Python"

print(word[0])    # P，第一个字符
print(word[-1])   # n，最后一个字符
print(word[-2])   # o，倒数第二个

切片语法 s[start:stop:step] 提取从 start（含）到 stop（不含）的子串：

word = "Python"

print(word[0:2])   # Py，前两个字符
print(word[2:5])   # tho，索引 2、3、4
print(word[:2])    # Py，从头开始到 2
print(word[4:])    # on，从 4 到末尾
print(word[-2:])   # on，最后两个
print(word[:])     # Python，完整拷贝
print(word[::-1])  # nohtyP，反转

切片不会越界报错，超出范围的索引会被优雅处理：

s = "hi"
print(s[0:100])   # hi，超出部分忽略
print(s[10:20])   # 空字符串

但直接索引越界会触发 IndexError：

# s[100]  # IndexError: string index out of range

字符串操作与方法

字符串支持 + 拼接和 * 重复：

print("Py" + "thon")     # Python
print("=" * 20)          # ====================

相邻的字符串字面值会自动拼接（仅适用于字面值，不适用于变量）：

text = ("Put several strings "
        "within parentheses "
        "to have them joined.")

常用方法涵盖了查找、替换、分割、格式化等场景：

s = "  Hello, World!  "

print(s.strip())           # "Hello, World!"，去除两端空白
print(s.lower())           # "  hello, world!  "
print(s.upper())           # "  HELLO, WORLD!  "
print(s.startswith("  H"))  # True
print(s.endswith("!  "))    # True
print(s.find("World"))      # 9，找不到返回 -1
print(s.index("World"))     # 9，找不到触发 ValueError
print(s.replace("World", "Python"))  # "  Hello, Python!  "
print(s.split(","))         # ['  Hello', ' World!  ']
print("-".join(["a", "b", "c"]))  # a-b-c
print(s.count("l"))         # 3
print(len(s))               # 17

split() 默认按任意空白分割，且自动去除空字符串：

print("  a   b   c  ".split())  # ['a', 'b', 'c']

成员检查与比较

in 和 not in 检查子串是否存在：

print("Py" in "Python")      # True
print("py" in "Python")      # False，大小写敏感
print("java" not in "Python") # True

字符串支持字典序比较，按 Unicode 码点逐字符比较：

print("apple" < "banana")    # True
print("Z" < "a")             # True，大写字母码点小于小写
print("10" < "2")            # True，字符串按字符比较，"1" < "2"

字符串格式化

Python 3.12 推荐使用 f-string 进行字符串格式化，它允许在字符串中嵌入表达式：

name = "Python"
version = 3.12

print(f"{name} {version}")
print(f"{name!r}")           # 'Python'，!r 调用 repr()
print(f"{version:.2f}")      # 3.12
print(f"{1000000:,}")        # 1,000,000，千分位
print(f"{0.85:.1%}")         # 85.0%，百分比

Python 3.12 解除了 f-string 的一些历史限制：表达式内可以包含与外层相同的引号，也可以使用反斜杠：

# Python 3.12+ 允许
value = "test"
print(f"结果：{value}")       # 表达式内使用双引号
print(f"路径：{value.replace(chr(92), '/')}")  # 表达式内使用反斜杠

旧式 % 格式化和 str.format() 仍然可用，但新项目建议优先使用 f-string：

# % 格式化（旧式，不推荐新项目使用）
print("Name: %s, Age: %d" % ("Alice", 30))

# str.format()
print("Name: {}, Age: {}".format("Alice", 30))
print("Name: {name}, Age: {age}".format(name="Alice", age=30))

编码与解码

字符串在 Python 内部以 Unicode 码点存储，与具体编码无关。当需要写入文件或通过网络传输时，必须编码为字节序列：

text = "你好，世界"

# 编码为 UTF-8 字节
b = text.encode("utf-8")
print(b)  # b'\xe4\xbd\xa0\xe5\xa5\xbd\xef\xbc\x8c\xe4\xb8\x96\xe7\x95\x8c'

# 解码回字符串
print(b.decode("utf-8"))  # 你好，世界

编码失败时默认抛出 UnicodeEncodeError，可以通过 errors 参数控制行为：

text = "你好"
print(text.encode("ascii", errors="ignore"))     # b''，忽略无法编码的字符
print(text.encode("ascii", errors="replace"))    # b'??'，替换为问号
print(text.encode("ascii", errors="xmlcharrefreplace"))  # b'&#20320;&#22909;'

边界与常见错误

单元素字符串：长度为 1 的字符串没有独立类型，它就是 str。如果需要字符的 Unicode 码点，使用 ord()；反向操作用 chr()：

print(ord("A"))   # 65
print(chr(65))    # A

字符串与字节混淆：

# 错误
# "hello" + b"world"  # TypeError: can only concatenate str to str

# 正确
"hello" + b"world".decode("utf-8")

修改字符串的错觉：

s = "abc"
s.upper()
print(s)  # abc，upper() 返回新字符串，原字符串不变

# 正确做法
s = s.upper()

空字符串与 None 的区别：

s = ""
print(s is None)   # False
print(s == "")     # True
print(bool(s))     # False

字符串是 Python 中最基础也最常用的类型之一。掌握其不可变性、切片规则和丰富的内置方法，能显著提升文本处理代码的质量和效率。