Python 3.3 有什么新变化

本文介绍了 Python 3.3 相比 3.2 的新增特性。 Python 3.3 于 2012 年 9 月 29 日 发布。 有关完整详细信息,请参见 changelog

参见

PEP 398 - Python 3.3 发布计划

摘要 -- 发布重点

新的语法特性:

  • 新增 yield from 表达式用于 生成器委托

  • u'unicode' 语法重新被接受用于 str 对象。

新的库模块:

  • faulthandler (帮助调试低层级的崩溃)

  • ipaddress (代表 IP 地址和掩码的高层级对象)

  • lzma (使用 XZ / LZMA 算法压缩数据)

  • unittest.mock (使用模拟对象替换你的受测试系统中的某些部分)

  • venv (Python 虚拟环境,类似于流行的 virtualenv 包)

新的内置特性:

实现的改进:

显著改进的库模块:

  • 针对 decimal 模块的 C 加速器。

  • email 模块中更好的 Unicode 处理 (暂定)。

安全改进:

  • 哈希随机化被默认启用。

请继续阅读有关面向用户的改变的详细清单。

PEP 405: 虚拟环境

虚拟环境有助于创建 独立的 Python 设置,同时共享全系统的基础安装,便于维护。 虚拟环境有自己的私有站点包(即本地安装的库),并可选择与系统范围的站点包分离。 虚拟环境的概念和实现 受到流行的``virtualenv`` 第三方 包 的启发,但受益于与解释器 核心更紧密的集成。

本 PEP 添加了 venv 模块用于编程访问,以及 pyvenv 脚本用于命令在线访问和管理。 Python 解释器会检查 pyvenv.cfg,文件的存在标志着虚拟环境目录树的基础。

参见

PEP 405 - Python虚拟环境

PEP 由 Carl Meyer 撰写 ; 由 Carl Meyer 和 Vinay Sajip 实现。

PEP 420: 隐式命名空间包

原生支持不要求 __init__.py 标记文件和可以自动跨越多个路径节的包目录(灵感来自多个命名空间包的第三方方案,如 PEP 420 中所述)

参见

PEP 420 - 隐式命名空间包

PEP 由 Eric V. Smith 撰写,由 Eric V. Smith 和 Barry Warsaw 实现

PEP 3118: 新的内存视图实现和缓冲协议文档

PEP 3118 的实现已获得大幅改进。

新的 memoryview 实现全面修复了 Py_buffer 结构体中曾导致多起崩溃报告的动态分配字段的所有权和生命周期问题。 此外,还修复了多个函数在非连续或多维输入时崩溃或返回不正确结果的问题。

现在 memoryview 对象具有符合 PEP-3118 标准的 getbufferproc(),可以检查使用者的请求类型。 新增了许多新的特性,其中的大部分已适用于非连续数组和带有子偏移量的数组。

文档已进行更新,清楚地列出了导出方和使用方的责任。 缓冲区请求旗标志被划分为基本旗标和复合旗标。 对非连续和多维的 NumPy 风格数组的内存布局进行了说明。

相关特性

  • 现在 struct 模块语法中所有原生单字符格式指示符(可以选择添加 '@' 前缀)均受到支持。

  • 在某些限制条件下,cast() 方法允许改变 C 连续数组的格式和形状。

  • 任何数组类型都支持多维列表的表示形式。

  • 任何数组类型都支持多维比较操作。

  • 格式为 B、b 或 c 的可哈希(只读)类型的一维 memoryview 现在将是可哈希的。 (由 Antoine Pitrou 在 bpo-13411 中贡献。)

  • 支持对 1 维数据类型的任意切片。 例如,现在可以通过使用负步长值以 O(1) 复杂度对 memoryview 进行翻转。

API 的变化

  • 官方的最大维度数量限制已更改为 64。

  • 空形状、区间和子偏移量的表示形式现在是空元组而不是 None

  • 现在对格式为 'B' (无符号字节型) 的 memoryview 元素的访问将返回一个整数(遵循结构体模块语法)。 要返回字节串对象则必须先将视图强制转换为 'c'。

  • 现在 memoryview 比较将使用操作数的逻辑结构并会按值来比较所有数组元素。 结构体模块语法中的所有格式化字符串均受到支持。 带有不可识别的格式化字符串的视图仍然被允许,但无论视图内容如何比较结果总是不相等。

  • 更多改变请参阅 Build and C API ChangesPorting C code

(由 Stefan Krah 在 bpo-10181 中贡献。)

参见

PEP 3118 - 修改缓冲区协议

PEP 393: 灵活的字符串表示

Unicode字符串类型已改为支持多种内部表示法,具体取决于所表示的字符串中具有最大 Unicode 序号(1、2 或 4 字节)的字符 。 这样,在常见情况下可以节省空间,但在所有系统上都能使用完整的 UCS-4。 对于使用现有应用程序接口的兼容性 来说,可能会并行存在几种表示法;随着时间的推移,这种兼容性 应逐步淘汰。

在 Python 一方,此项改变应当没有任何缺点。

在 C API 方面,PEP 393 完全向下兼容。 旧的 API 至少还能使用五年。 使用传统 API 的应用程序不会完全受益于内存的减少,或者更糟的是,可能会使用更多的内存,因为 Python 可能需要维护每个字符串的两个版本(传统格式和新的高效存储)。

功能

PEP 393 引入的改变如下:

  • Python 现在始终支持全部 Unicode 码位,包括非 BMP 码位 (即从``U+0000`` 到 U+10FFFF)。 窄编译版本和宽编译版本之间的区别已不复存在,Python 现在的行为就像宽编译版本,甚至在 Windows 下也是如此。

  • 随着窄编译版本的消亡,窄编译版本特有的问题也得到了解决,例如:

    • 现在 len() 对于非 BMP 字符总是返回 1,因此 len('\U0010FFFF') == 1

    • 替换对不会在字符串字面值中重新合并,因此 '\uDBFF\uDFFF' != '\U0010FFFF'

    • 索引或切分非 BMP 字符会返回预期的值,因此 '\U0010FFFF'[0] 现在会返回 '\U0010FFFF' 而不是 '\uDBFF'

    • 标准库中的所有其他函数现在都能正确处理非 BMP 代码点。

  • sys.maxunicode 的值现在总是 1114111 (十六进制为 0x10FFFF)。 PyUnicode_GetMax() 函数仍返回 0xFFFF0x10FFFF 以便向下兼容,但不应与新的 Unicode API 一起使用 (参见 bpo-13054)。

  • ./configure 标志 --with-wide-unicode 已被移除。

性能和资源使用情况

现在,Unicode 字符串的存储取决于字符串中的最高码位:

  • 纯 ASCII 和 Latin1 字符串 (U+0000-U+00FF) 每个码位使用 1 个字节;

  • BMP 字符串 (U+0000-U+FFFF) 每个码位使用 2 个字节;

  • 非 BMP 字符串 (U+10000-U+10FFFF) 每个码位使用 4 个字节。

这样做的效果是,对于大多数应用而言,字符串存储的内存使用量应该会大幅减少 —— 尤其是与以前的宽 unicode 版本相比 —— 因为在许多情况下,即使在国际环境中,字符串也将是纯 ASCII 格式(因为许多字符串存储的是非人类语言数据,如 XML 片段、HTTP 标头、JSON 编码数据等)。 出于同样的原因,我们还希望它能提高非小应用程序的 CPU 缓存效率。 在 Django 基准测试中,Python 3.3 的内存使用量比 Python 3.2 少两到三倍,比 Python 2.7 略好一些(详情请参见 PEP)。

参见

PEP 393 - 灵活的字符串表示

PEP 由 Martin von Löwis 撰写 ; 由 Torsten Becker 和 Martin von Löwis 实现。

PEP 397: 适用于Windows的Python启动器

Python 3.3 的 Windows 安装程序现在包含一个 py 启动程序,可用于以版本无关的方式启动 Python 应用程序。

双击 *.py 文件时会隐式调用该启动器。 如果系统中只安装了一个 Python 版本,则将使用该版本运行文件。 如果安装了多个版本,则默认使用最新版本,但也可以通过在 Python 脚本中加入 Unix 风格的“shebang 行”来覆盖该版本。

启动器也可以作为 py 应用程序在命令行中显式使用。运行 py 遵循与隐式启动脚本相同的版本选择规则,但可以通过传递适当的参数来选择更具体的版本(例如,当 Python 2 也已安装时,使用 -3 来请求 Python 3;当安装了较新的 Python 版本时,使用 -2.6 来特别请求较早的 Python 版本)。

除了启动器之外,Windows 安装程序现在还包含一个选项,可将新安装的 Python 添加到系统 PATH 中。 (由 Brian Curtin 在 bpo-3561 中贡献)。

参见

PEP 397 - 适用于Windows的Python启动器

PEP 由 Mark Hammond 和 Martin v. Löwis 撰写 ; 由 Vinay Sajip实现。

启动器文档: Python Launcher for Windows

安装器 PATH 修改: Finding the Python executable

PEP 3151: 重写 OS 和 IO 异常的层次结构

现在,由操作系统错误引发的异常层次结构既得到了简化,又更加精细。

您不必再为在 OSErrorIOErrorEnvironmentErrorWindowsErrormmap.errorsocket.errorselect.error 之间选择合适的异常类型而烦恼。 所有这些异常类型现在都只有一个: OSError。 出于兼容性考虑,其他名称将作为别名保留。

此外,现在捕捉特定错误条件也更容易了。无需从 errno 模块中检查 errno 属性(或 args[0] )中的特定常量,您可以捕捉适当的 OSError 子类。可用的子类如下:

并且 ConnectionError 本身具有细粒度的子类:

有了新的异常,现在就可以避免 errno 的常见用法了。 例如,下面是为 Python 3.2 编写的代码:

from errno import ENOENT, EACCES, EPERM

try:
    with open("document.txt") as f:
        content = f.read()
except IOError as err:
    if err.errno == ENOENT:
        print("document.txt file is missing")
    elif err.errno in (EACCES, EPERM):
        print("You are not allowed to read document.txt")
    else:
        raise

现在无需导入 errno,也无需手动检查异常属性:

try:
    with open("document.txt") as f:
        content = f.read()
except FileNotFoundError:
    print("document.txt file is missing")
except PermissionError:
    print("You are not allowed to read document.txt")

参见

PEP 3151 - 重写 OS 和 IO 异常的层次结构

PEP 由 Antoine Pitrou 撰写并实现

PEP 380: 委托给子生成器的语法

PEP 380 增加了 yield from 表达式,允许 generator 将其部分操作委托给另一个生成器。 这样,包含 yield 的代码部分就可以被分解出来,放在另一个生成器中。 此外,还允许子生成器返回一个值,并将该值提供给委托生成器。

虽然 yield from 表达式主要用于委托给子生成器,但它实际上允许委托给任意子生成器。

对于简单的迭代器而言,yield from iterable 本质上只是 for item in iterable: yield item 的简写形式:

>>> def g(x):
...     yield from range(x, 0, -1)
...     yield from range(x)
...
>>> list(g(5))
[5, 4, 3, 2, 1, 0, 1, 2, 3, 4]

但是,与普通的循环不同,yield from 允许子生成器直接从调用方作用域获取、发送和抛出值,并向外层生成器返回一个最终值:

>>> def accumulate():
...     tally = 0
...     while 1:
...         next = yield
...         if next is None:
...             return tally
...         tally += next
...
>>> def gather_tallies(tallies):
...     while 1:
...         tally = yield from accumulate()
...         tallies.append(tally)
...
>>> tallies = []
>>> acc = gather_tallies(tallies)
>>> next(acc)  # 确保累计器准备好接受值
>>> for i in range(4):
...     acc.send(i)
...
>>> acc.send(None)  # 完成第一次记录
>>> for i in range(5):
...     acc.send(i)
...
>>> acc.send(None)  # 完成第二次记录
>>> tallies
[6, 10]

推动这项改变的主要原则是允许即便被设计用来配合 sendthrow 方法使用的生成器也能像一个大函数能拆分成多个子函数那样容易地拆分为多个子生成器。

参见

PEP 380 - 委托给子生成器的语法

PEP 由 Greg Ewing 撰写,由 Greg Ewing 实现。由 Renaud Blanch,Ryan Kelly 和 Nick Coghlan 集成到3.3,由 Zbigniew Jędrzejewski-Szmek 和 Nick Coghlan 编写文档

PEP 409: 清除异常上下文

PEP 409 引入了允许禁用串连的异常上下文显示的新语法。 这允许在不同异常类型间进行转换的应用程序具有更清晰的错误消息:

>>> class D:
...     def __init__(self, extra):
...         self._extra_attributes = extra
...     def __getattr__(self, attr):
...         try:
...             return self._extra_attributes[attr]
...         except KeyError:
...             raise AttributeError(attr) from None
...
>>> D({}).x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 8, in __getattr__
AttributeError: x

如果后面没有 from None 来屏蔽异常原因,则默认原始异常将被显示:

>>> class C:
...     def __init__(self, extra):
...         self._extra_attributes = extra
...     def __getattr__(self, attr):
...         try:
...             return self._extra_attributes[attr]
...         except KeyError:
...             raise AttributeError(attr)
...
>>> C({}).x
Traceback (most recent call last):
  File "<stdin>", line 6, in __getattr__
KeyError: 'x'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 8, in __getattr__
AttributeError: x

调试功能并未丢失,因为原始异常上下文在需要时仍然可用(举例来说,如果某个中间库不正确地抑制了有价值的下层细节):

>>> try:
...     D({}).x
... except AttributeError as exc:
...     print(repr(exc.__context__))
...
KeyError('x',)

参见

PEP 409 - 清除异常上下文

PEP 由 Ethan Furman 撰写 ,由 Ethan Furman 和 Nick Coghlan 实现。

PEP 414: 显式的Unicode文本

为使从 Python 2 迁移重度使用 Unicode 字面值的 Unicode 自适应型 Python 应用程序更为容易,Python 3.3 重新支持字符串字面值使用 "u" 前缀。 该前缀在 Python 3 中并无语法意义,提供它只是为了减少在迁移到 Python 3 时纯粹机械性的修改数量,让开发者能更轻松的关注更重要的语法变化(比如默认更严格的二进制和文本数据的区分)。

参见

PEP 414 - 显式的Unicode文本

PEP 由 Armin Ronacher 撰写

PEP 3155: 类和函数的限定名称

函数和类对象新增了 __qualname__ 属性来表示从模块最高层级到其定义位置的“路径”。 对于全局函数和类,这将与 __name__ 相同。 对于其他函数和类,它提供了有关这些对象实际定义所在位置,以及如何从全局作用域访问它们的更详细信息。

包含(未绑定)方法的示例:

>>> class C:
...     def meth(self):
...         pass
...
>>> C.meth.__name__
'meth'
>>> C.meth.__qualname__
'C.meth'

包含嵌套类的示例:

>>> class C:
...     class D:
...         def meth(self):
...             pass
...
>>> C.D.__name__
'D'
>>> C.D.__qualname__
'C.D'
>>> C.D.meth.__name__
'meth'
>>> C.D.meth.__qualname__
'C.D.meth'

包含嵌套函数的示例:

>>> def outer():
...     def inner():
...         pass
...     return inner
...
>>> outer().__name__
'inner'
>>> outer().__qualname__
'outer.<locals>.inner'

这些对象的字符串表示形式也被修改以包括新的更准确的信息:

>>> str(C.D)
"<class '__main__.C.D'>"
>>> str(C.D.meth)
'<function C.D.meth at 0x7f46b9fe31e0>'

参见

PEP 3155 - 类和函数的限定名称

PEP 由 Antoine Pitrou 撰写并实现

PEP 412: Key-Sharing Dictionary

用于存储对象属性的字典现在能够在彼此之间共享部分内部存储(比如说,存储键及其对应哈希值的部分)。 这减少了程序创建多个非内置类型实例的内存消耗。

参见

PEP 412 - Key-Sharing Dictionary

PEP 由 Mark Shannon 撰写并实现。

PEP 362: 函数签名对象

新增的函数 inspect.signature() 使得对 python 可调用对象的内省更为简单直观。 多种可调用对象均受到支持:不论是否带装饰器的 python 函数,类以及 functools.partial() 对象。 新增的类 inspect.Signature, inspect.Parameterinspect.BoundArguments 保存了有关调用签名的信息,如标注、默认值、形参类别和绑定参数等,这极大地简化了编写装饰器和其他任何验证或修改调用签名或参数的工作。

参见

PEP 362: - 函数签名对象

PEP 由 Brett Cannon,Yury Selivanov,Larry Hastings,Jiwon Seo 撰写,由 Yury Selivanov 实现

PEP 421: 添加 sys.implementation

新增的 sys 模块属性将对外公开当前运行的解释器实现的专属细节。 sys.implementation 上的初始属性集合包括 name, version, hexversioncache_tag

sys.implementation 的目的是将标准库所使用的具体实现专属数据合并到一个命名空间中。 这允许不同的 Python 实现能更方便地共享同一个标准库基准代码。 在其初始状态中,sys.implementation 只保持具体实现专属数据的一小部分。 随着时间推移这个比例将进行调整以使标准库具有更好的可移植性。

标准库可移植性提高的一个例子是 cache_tag。 在 Python 3.3 中,sys.implementation.cache_tagimportlib 用来支持与 PEP 3147 的一致性。 任何使用 importlib 来处理内置导入系统的 Python 实现都可使用 cache_tag 来控制各个模块的缓存行为。

SimpleNamespace

sys.implementation 的实现还为 Python 引入了一个新类型: types.SimpleNamespace。 相比基于映射的命名空间像是 dictSimpleNamespace 是基于属性的,像是 object。 不过,与 object 不同,SimpleNamespace 实例是可写的。 这意味着你可以通过常规的属性访问来添加、移除和修改命名空间。

参见

PEP 421 - 添加 sys.implementation

PEP 由 Eric Snow 撰写并实现

使用 importlib 作为导入的实现

bpo-2377 - 以 importlib.__import__ 替代 __import__ bpo-13959 - 以纯 Python 重新实现部分 imp bpo-14605 - 使导入机制成为显式的 bpo-14646 - 要求加载器设置 __loader__ 和 __package__

现在 __import__() 函数是由 importlib.__import__() 驱动的。 这项工作标志着 PEP 302 的“第 2 阶段”的结束。 这一变化带来了许多好处。 首先,它允许更多驱动导入的机制对外公开而不是保持隐式并藏在 C 代码内部。 它还提供了单一的实现给所有支持 Python 3.3 的 Python VM 使用,有助于结束导入语义中有关特定 VM 的偏差。 最后它还减轻了导入的维护负担,以允许未来的更多改进。

对于普通用户来说,语义上的变化应该是不可见的。 对于目前直接操纵 import 或以程序方式调用 import 的代码来说,可能需要进行的代码修改将在本文档的 Porting Python code 一节中阐述。

新的API

此项工作的一个好处是对外公开了实现 import 语句所涉及的机制。 这意味着过去保持隐式的各种导入器现在都作为 importlib 包的组成部分被完整暴露出来。

importlib.abc 中定义的抽象基类已分别通过 importlib.abc.MetaPathFinderimportlib.abc.PathEntryFinder 进行扩展以正确地描述 元路径查找器路径条目查找器。 现在提供旧的 ABC importlib.abc.Finder 只是为了保持向下兼容性而没有包含任何方法要求。

对于查找器,importlib.machinery.FileFinder 对外公开了用于查找一个模块的源代码和字节码文件的机制。 在之前版本中该类则是 sys.path_hooks 的一个隐式成员。

对于加载器,新的抽象基类 importlib.abc.FileLoader 可帮助编写使用文件系统作为模块代码的存储机制的加载器。 针对源代码的 (importlib.machinery.SourceFileLoader), 不带源代码的字节码文件的 (importlib.machinery.SourcelessFileLoader) 以及扩展模块的 (importlib.machinery.ExtensionFileLoader) 加载器现在均可被直接使用。

现在 ImportError 具有 namepath 属性并会在有相关数据要提供时被设置。 针对导入失败的消息现在也会提供模块的完整名称而不是仅有模块名称的末尾部分。

现在 importlib.invalidate_caches() 函数会调用缓存在 sys.path_importer_cache 中的所有查找器上具有相同名称的方法以帮助根据需要清理任何已储存的状态。

可见的改变

对于可能需要修改的代码,请参阅 Porting Python code 一节。

importlib 当前对外公开的范围之外,还有其他可见的针对 import 的改变。 最大的改变是 sys.meta_pathsys.path_hooks 现在将储存 import 用到的所有元路径查找器和路径条目钩子。 在之前版本中这些查找器是隐式地藏在 import 的 C 代码内部而不是直接对外公开。 这意味着现在你可以方便地移除或修改各个查找器的顺序以适合你的需要。

另一个变化是所有模块都具有 __loader__ 属性,它储存被用于创建该模块的加载器。 PEP 302 已被更新以强制要求加载器实现该属性,因此未来的第 3 方加载器一旦被更新人们将能确定该属性的存在。 但在此之前,import 都需要在加载之后设置模块。

加载器现在还需要根据 PEP 366 设置 __package__ 属性。 同样地,import 本身已经在所有来自 importlib 的加载器上设置了该属性并且 import 是在加载之后自行设置该属性。

现在当 sys.path_hooks 上未找到查找器时将把 None 插入到 sys.path_importer_cache 中。 由于 imp.NullImporter 不是直接暴露在 sys.path_hooks 上因此该值不再能够作为表示未找到查找器的可靠依据。

有关语法变化的所有其他修改在为 Python 3.3 更新代码时都应当被纳入考虑,因此应当仔细阅读本文档的 Porting Python code 章节。

(由 Brett Cannon 编写的实现)

其他语言特性修改

对Python 语言核心进行的小改动:

  • 增加了对 Unicode 别名和具名序列的支持。 现在 unicodedata.lookup()'\N{...}' 都能解析别名,而 unicodedata.lookup() 还能解析具名序列。

    (由 Ezio Melotti 在 bpo-12753 中贡献。)

  • Unicode 数据库更新至 UCD 版本 6.1.0

  • 现在 range() 对象的相等性比较将返回反映下层的由这些 range 对象生成的序列的相等性的结果。 (bpo-13201)

  • 现在 bytesbytearray 对象的 count(), find(), rfind(), index()rindex() 方法将接受一个 0 到 255 范围内的整数作为其第一个参数。

    (由 Petri Lehtinen 在 bpo-12170 中贡献。)

  • 现在 bytesbytearrayrjust(), ljust()center() 方法将接受一个 bytearray 作为 fill 参数。 (由 Petri Lehtinen 在 bpo-12380 中贡献。)

  • listbytearray 增加了新的方法: copy()clear() (bpo-10516)。 相应地,MutableSequence 现在也定义了一个 clear() 方法 (bpo-11388)。

  • 原始字节串字面值现在可以写成 rb"..." 也可以写成 br"..."

    (由 Antoine Pitrou 在 bpo-13748 中贡献。)

  • 现在 dict.setdefault() 对给定的键将只执行一次查找,这使其在配合内置类型使用时是原子化的。

    (由 Filip Gruszczyński 在 bpo-13521 中贡献。)

  • 当函数调用与函数签名不匹配时产生的错误消息已获得大幅改进。

    (由 Benjamin Peterson 贡献。)

更细粒度的导入锁

之前版本的 CPython 是始终依赖于全局导入锁的。 这会导致预料之外的困扰,比如当导入一个模块会触发代码在其他线程中执行作为附带影响导致的死锁。 有时需要应用一些笨拙的绕过方式,比如 PyImport_ImportModuleNoBlock() C API 函数。

在 Python 3.3 中,导入一个模块会使用单独的模块级锁。 这能正确地从多个线程序列化给定模块的导入操作(防止出现被不完整初始化的模块),同时消除之前提到的困扰。

(由 Antoine Pitrou 在 bpo-9260 中贡献。)

内置函数和类型

  • open() 新增了 opener 形参:文件对象下层的文件描述符将随后通过调用 opener 并附带 (file, flags) 来获取。 它可以被用来使用自定义旗标例如 os.O_CLOEXEC。 增加了 'x' 模式:打开为独占创建,如果文件已存在则打开失败。

  • print(): 增加了 flush 关键字参数。 如果 flush 关键字参数为真值,流会被强制刷新。

  • hash(): 默认将启用哈希随机化,参见 object.__hash__()PYTHONHASHSEED

  • str 类型新增了一个 casefold() 方法:返回字符串的大小写折叠副本,大小写折叠形式的字符串可被用于不区分大小写的匹配。 例如,'ß'.casefold() 将返回 'ss'

  • 序列的文档已被大幅重写以更好地解释二进制/序列的区别并为各种内置序列类型提供专属的文档章节 (bpo-4966)。

新增模块

faulthandler

新增的调试模块 faulthandler 包含用于在发生错误(如段错误之类的程序崩溃),达到超时限制或收到用户信号时显式转储 Python 回溯的函数。 调用 faulthandler.enable() 可安装针对 SIGSEGV, SIGFPE, SIGABRT, SIGBUS, 和 SIGILL 信号的错误处理器。 你还可以在启动时通过设置 PYTHONFAULTHANDLER 环境变量或使用 -X faulthandler 命令行选项来启用它们。

Linux 上的段错误示例:

$ python -q -X faulthandler
>>> import ctypes
>>> ctypes.string_at(0)
Fatal Python error: Segmentation fault

Current thread 0x00007fb899f39700:
  File "/home/python/cpython/Lib/ctypes/__init__.py", line 486 in string_at
  File "<stdin>", line 1 in <module>
Segmentation fault

ipaddress

新的 ipaddress 模块提供了用于创建和操作代表 IPv4 和 IPv6 地址、网络和接口(即关联到特定 IP 子网的 IP 地址)的工具。

(由 Google 和 Peter Moody 在 bpo-3144 中贡献。)

lzma

新增的 lzma 模块提供了使用 LZMA 算法的数据压缩和解压,包括对 .xz.lzma 文件格式的支持。

(由 Nadeem Vawda 和 Per Øyvind Karlsen 在 bpo-6715 中贡献。)

改进的模块

abc

改进了对包含由抽象方法组成的描述器的抽象基类的支持。 现在声明抽象描述器的推荐方式是提供 __isabstractmethod__ 作为动态更新的特性属性。 内置的描述器已获得到相应的更新。

(由 Pablo Galindo 在 bpo-11610 中贡献。)

现在 abc.ABCMeta.register() 将返回已注册的子类,这意味着它现在可被用作类装饰器 (bpo-10868)。

array

array 模块可使用 qQ 类型码支持 long long 类型。

(由 Oren Tirosh 和 Hirokazu Yamamoto 在 bpo-1172711 中贡献。)

base64

现在 base64 现代接口的解码函数可接受仅包含 ASCII 字符的 Unicode 字符串。 例如,base64.b64decode('YWJj') 将返回 b'abc'。 (由 Catalin Iacob 在 bpo-13641 中贡献。)

binascii

除了它们通常接受的二进制对象,a2b_ 现在还接受仅包含 ASCII 字符的字符串作为输入。 (由 Antoine Pitrou 在 bpo-13637 中贡献。)

bz2

bz2 模块已被重新编写。 在此过程中,添加了一些新的特征:

  • 新的 bz2.open() 函数:以二进制或文本模式打开 bzip2 压缩文件。

  • bz2.BZ2File 现在可以读写任意文件型对象,具体方式是通过其构造器的 fileobj 参数。

    (由 Nadeem Vawda 在 bpo-5863 中贡献。)

  • 现在 bz2.BZ2Filebz2.decompress() 能解压缩多流输入(比如由 pbzip2 工具所产生的输入)。 bz2.BZ2File 现在还可被用来创建这种类型的文件,具体做法是使用 'a' (append) 模式。

    (由 Nir Aides 在 bpo-1625 中贡献。)

  • 现在 bz2.BZ2File 实现了所有的 io.BufferedIOBase API,但 detach()truncate() 等方法除外。

编码器

mbcs 编解码器被重写以能够在所有 Windows 版本上正确处理 replaceignore 错误处理器。 mbcs 编解码器现在支持所有错误处理器,而不是仅支持将 replace 用于编码并将 ignore 用于解码。

新增了一个 Windows 专属的编解码器: cp65001 (bpo-13216)。 即 Windows 代码页 65001 (Windows UTF-8, CP_UTF8)。 举例来说,如果控制台输出代码页被设为 cp65001(例如使用 chcp 65001 命令)则 sys.stdout 就会使用它。

多字节 CJK 解码器现在能更快地进行再同步。 它们将只忽略非法字节序列的第一个字节。 例如,现在 b'\xff\n'.decode('gb2312', 'replace') 将在替换字符后返回一个 \n

(bpo-12016)

增量式 CJK 编解码器的编码器在每次调用其 encode() 方法时将不再重置。 例如:

>>> import codecs
>>> encoder = codecs.getincrementalencoder('hz')('strict')
>>> b''.join(encoder.encode(x) for x in '\u52ff\u65bd\u65bc\u4eba\u3002 Bye.')
b'~{NpJ)l6HK!#~} Bye.'

对于旧版 Python 此示例将给出 b'~{Np~}~{J)~}~{l6~}~{HK~}~{!#~} Bye.'

(bpo-12100)

unicode_internal 编解码器已被弃用。

collections

新增了一个 ChainMap 类以允许将多个映射当作一个单元来处理。 (由 Raymond Hettinger 针对 bpo-11089 编写,在 bpo-11297 中对外公开。)

The abstract base classes have been moved in a new collections.abc module, to better differentiate between the abstract and the concrete collections classes. Aliases for ABCs are still present in the collections module to preserve existing imports. (bpo-11085)

The Counter class now supports the unary + and - operators, as well as the in-place operators +=, -=, |=, and &=. (Contributed by Raymond Hettinger in bpo-13121.)

contextlib

ExitStack now provides a solid foundation for programmatic manipulation of context managers and similar cleanup functionality. Unlike the previous contextlib.nested API (which was deprecated and removed), the new API is designed to work correctly regardless of whether context managers acquire their resources in their __init__ method (for example, file objects) or in their __enter__ method (for example, synchronisation objects from the threading module).

(bpo-13585)

crypt

Addition of salt and modular crypt format (hashing method) and the mksalt() function to the crypt module.

(bpo-10924)

curses

  • If the curses module is linked to the ncursesw library, use Unicode functions when Unicode strings or characters are passed (e.g. waddwstr()), and bytes functions otherwise (e.g. waddstr()).

  • Use the locale encoding instead of utf-8 to encode Unicode strings.

  • curses.window 添加了新的 curses.window.encoding 属性。

  • curses.window 类有一个新的 get_wch() 方法用来获取一个宽字符。

  • curses 模块有一个新的 unget_wch() 函数用来推入一个宽字符以便下一个 get_wch() 将返回它。

(由 Iñigo Serna 在 bpo-6755 中贡献。)

datetime

decimal

bpo-7652 - integrate fast native decimal arithmetic.

C-module and libmpdec written by Stefan Krah.

The new C version of the decimal module integrates the high speed libmpdec library for arbitrary precision correctly rounded decimal floating-point arithmetic. libmpdec conforms to IBM's General Decimal Arithmetic Specification.

Performance gains range from 10x for database applications to 100x for numerically intensive applications. These numbers are expected gains for standard precisions used in decimal floating-point arithmetic. Since the precision is user configurable, the exact figures may vary. For example, in integer bignum arithmetic the differences can be significantly higher.

下表只用于简单展示。 基准测试详情参见 https://www.bytereef.org/mpdecimal/quickstart.html

decimal.py

_decimal

加速

pi

42.02秒

0.345秒

120倍

telco

172.19秒

5.68秒

30倍

psycopg

3.57秒

0.29秒

12倍

相关特性

  • FloatOperation 信号可选择启用针对混用 float 和 Decimal 时更严格的语义限制。

  • 如果 Python 编译时不带线程,则 C 版本会自动禁用高资源开销的线程局部上下文机制。 在此情况下,变量 HAVE_THREADS 将被设为 False

API 的变化

  • C模块上下文限制(如下表),具体取决于计算机体系结构:

    32位

    64位

    MAX_PREC

    425000000

    999999999999999999

    MAX_EMAX

    425000000

    999999999999999999

    MIN_EMIN

    -425000000

    -999999999999999999

  • 在上下文模板 (DefaultContext, BasicContextExtendedContext) 中 EmaxEmin 的数值被修改为 999999

  • The Decimal constructor in decimal.py does not observe the context limits and converts values with arbitrary exponents or precision exactly. Since the C version has internal limits, the following scheme is used: If possible, values are converted exactly, otherwise InvalidOperation is raised and the result is NaN. In the latter case it is always possible to use create_decimal() in order to obtain a rounded or inexact value.

  • The power function in decimal.py is always correctly rounded. In the C version, it is defined in terms of the correctly rounded exp() and ln() functions, but the final result is only "almost always correctly rounded".

  • In the C version, the context dictionary containing the signals is a MutableMapping. For speed reasons, flags and traps always refer to the same MutableMapping that the context was initialized with. If a new signal dictionary is assigned, flags and traps are updated with the new values, but they do not reference the RHS dictionary.

  • Pickling a Context produces a different output in order to have a common interchange format for the Python and C versions.

  • The order of arguments in the Context constructor has been changed to match the order displayed by repr().

  • quantize() 方法的 watchexp 形参已被弃用。

email

策略框架

The email package now has a policy framework. A Policy is an object with several methods and properties that control how the email package behaves. The primary policy for Python 3.3 is the Compat32 policy, which provides backward compatibility with the email package in Python 3.2. A policy can be specified when an email message is parsed by a parser, or when a Message object is created, or when an email is serialized using a generator. Unless overridden, a policy passed to a parser is inherited by all the Message object and sub-objects created by the parser. By default a generator will use the policy of the Message object it is serializing. The default policy is compat32.

The minimum set of controls implemented by all policy objects are:

max_line_length

The maximum length, excluding the linesep character(s), individual lines may have when a Message is serialized. Defaults to 78.

linesep

The character used to separate individual lines when a Message is serialized. Defaults to \n.

cte_type

7bit or 8bit. 8bit applies only to a Bytes generator, and means that non-ASCII may be used where allowed by the protocol (or where it exists in the original input).

raise_on_defect

导致一个 parser 在遇到缺陷时引发错误而不是将它们添加到 Message 对象的 defects 列表。

A new policy instance, with new settings, is created using the clone() method of policy objects. clone takes any of the above controls as keyword arguments. Any control not specified in the call retains its default value. Thus you can create a policy that uses \r\n linesep characters like this:

mypolicy = compat32.clone(linesep='\r\n')

Policies can be used to make the generation of messages in the format needed by your application simpler. Instead of having to remember to specify linesep='\r\n' in all the places you call a generator, you can specify it once, when you set the policy used by the parser or the Message, whichever your program uses to create Message objects. On the other hand, if you need to generate messages in multiple forms, you can still specify the parameters in the appropriate generator call. Or you can have custom policy instances for your different cases, and pass those in when you create the generator.

Provisional Policy with New Header API

While the policy framework is worthwhile all by itself, the main motivation for introducing it is to allow the creation of new policies that implement new features for the email package in a way that maintains backward compatibility for those who do not use the new policies. Because the new policies introduce a new API, we are releasing them in Python 3.3 as a provisional policy. Backwards incompatible changes (up to and including removal of the code) may occur if deemed necessary by the core developers.

The new policies are instances of EmailPolicy, and add the following additional controls:

refold_source

Controls whether or not headers parsed by a parser are refolded by the generator. It can be none, long, or all. The default is long, which means that source headers with a line longer than max_line_length get refolded. none means no line get refolded, and all means that all lines get refolded.

header_factory

A callable that take a name and value and produces a custom header object.

The header_factory is the key to the new features provided by the new policies. When one of the new policies is used, any header retrieved from a Message object is an object produced by the header_factory, and any time you set a header on a Message it becomes an object produced by header_factory. All such header objects have a name attribute equal to the header name. Address and Date headers have additional attributes that give you access to the parsed data of the header. This means you can now do things like this:

>>> m = Message(policy=SMTP)
>>> m['To'] = 'Éric <foo@example.com>'
>>> m['to']
'Éric <foo@example.com>'
>>> m['to'].addresses
(Address(display_name='Éric', username='foo', domain='example.com'),)
>>> m['to'].addresses[0].username
'foo'
>>> m['to'].addresses[0].display_name
'Éric'
>>> m['Date'] = email.utils.localtime()
>>> m['Date'].datetime
datetime.datetime(2012, 5, 25, 21, 39, 24, 465484, tzinfo=datetime.timezone(datetime.timedelta(-1, 72000), 'EDT'))
>>> m['Date']
'Fri, 25 May 2012 21:44:27 -0400'
>>> print(m)
To: =?utf-8?q?=C3=89ric?= <foo@example.com>
Date: Fri, 25 May 2012 21:44:27 -0400

You will note that the unicode display name is automatically encoded as utf-8 when the message is serialized, but that when the header is accessed directly, you get the unicode version. This eliminates any need to deal with the email.header decode_header() or make_header() functions.

You can also create addresses from parts:

>>> m['cc'] = [Group('pals', [Address('Bob', 'bob', 'example.com'),
...                           Address('Sally', 'sally', 'example.com')]),
...            Address('Bonzo', addr_spec='bonz@laugh.com')]
>>> print(m)
To: =?utf-8?q?=C3=89ric?= <foo@example.com>
Date: Fri, 25 May 2012 21:44:27 -0400
cc: pals: Bob <bob@example.com>, Sally <sally@example.com>;, Bonzo <bonz@laugh.com>

Decoding to unicode is done automatically:

>>> m2 = message_from_string(str(m))
>>> m2['to']
'Éric <foo@example.com>'

When you parse a message, you can use the addresses and groups attributes of the header objects to access the groups and individual addresses:

>>> m2['cc'].addresses
(Address(display_name='Bob', username='bob', domain='example.com'), Address(display_name='Sally', username='sally', domain='example.com'), Address(display_name='Bonzo', username='bonz', domain='laugh.com'))
>>> m2['cc'].groups
(Group(display_name='pals', addresses=(Address(display_name='Bob', username='bob', domain='example.com'), Address(display_name='Sally', username='sally', domain='example.com')), Group(display_name=None, addresses=(Address(display_name='Bonzo', username='bonz', domain='laugh.com'),))

In summary, if you use one of the new policies, header manipulation works the way it ought to: your application works with unicode strings, and the email package transparently encodes and decodes the unicode to and from the RFC standard Content Transfer Encodings.

Other API Changes

New BytesHeaderParser, added to the parser module to complement HeaderParser and complete the Bytes API.

New utility functions:

ftplib

  • 现在 ftplib.FTP 接受一个 source_address 关键字参数用于在创建外发套接字时指定 (host, port) 作为绑定调用中的源地址。 (由 Giampaolo Rodolà 在 bpo-8594 中贡献。)

  • The FTP_TLS class now provides a new ccc() function to revert control channel back to plaintext. This can be useful to take advantage of firewalls that know how to handle NAT with non-secure FTP without opening fixed ports. (Contributed by Giampaolo Rodolà in bpo-12139.)

  • Added ftplib.FTP.mlsd() method which provides a parsable directory listing format and deprecates ftplib.FTP.nlst() and ftplib.FTP.dir(). (Contributed by Giampaolo Rodolà in bpo-11072.)

functools

The functools.lru_cache() decorator now accepts a typed keyword argument (that defaults to False to ensure that it caches values of different types that compare equal in separate cache slots. (Contributed by Raymond Hettinger in bpo-13227.)

gc

It is now possible to register callbacks invoked by the garbage collector before and after collection using the new callbacks list.

hmac

A new compare_digest() function has been added to prevent side channel attacks on digests through timing analysis. (Contributed by Nick Coghlan and Christian Heimes in bpo-15061.)

http

http.server.BaseHTTPRequestHandler now buffers the headers and writes them all at once when end_headers() is called. A new method flush_headers() can be used to directly manage when the accumulated headers are sent. (Contributed by Andrew Schaaf in bpo-3709.)

http.server now produces valid HTML 4.01 strict output. (Contributed by Ezio Melotti in bpo-13295.)

http.client.HTTPResponse now has a readinto() method, which means it can be used as an io.RawIOBase class. (Contributed by John Kuhn in bpo-13464.)

html

html.parser.HTMLParser is now able to parse broken markup without raising errors, therefore the strict argument of the constructor and the HTMLParseError exception are now deprecated. The ability to parse broken markup is the result of a number of bug fixes that are also available on the latest bug fix releases of Python 2.7/3.2. (Contributed by Ezio Melotti in bpo-15114, and bpo-14538, bpo-13993, bpo-13960, bpo-13358, bpo-1745761, bpo-755670, bpo-13357, bpo-12629, bpo-1200313, bpo-670664, bpo-13273, bpo-12888, bpo-7311.)

A new html5 dictionary that maps HTML5 named character references to the equivalent Unicode character(s) (e.g. html5['gt;'] == '>') has been added to the html.entities module. The dictionary is now also used by HTMLParser. (Contributed by Ezio Melotti in bpo-11113 and bpo-15156.)

imaplib

The IMAP4_SSL constructor now accepts an SSLContext parameter to control parameters of the secure channel.

(由 Sijin Joseph 在 bpo-8808 中贡献。)

inspect

A new getclosurevars() function has been added. This function reports the current binding of all names referenced from the function body and where those names were resolved, making it easier to verify correct internal state when testing code that relies on stateful closures.

(由 Meador Inge 和 Nick Coghlan 在 bpo-13062 中贡献。)

A new getgeneratorlocals() function has been added. This function reports the current binding of local variables in the generator's stack frame, making it easier to verify correct internal state when testing generators.

(由 Meador Inge 在 bpo-15153 中贡献。)

io

The open() function has a new 'x' mode that can be used to exclusively create a new file, and raise a FileExistsError if the file already exists. It is based on the C11 'x' mode to fopen().

(由 David Townshend 在 bpo-12760 中贡献。)

The constructor of the TextIOWrapper class has a new write_through optional argument. If write_through is True, calls to write() are guaranteed not to be buffered: any data written on the TextIOWrapper object is immediately handled to its underlying binary buffer.

itertools

accumulate() now takes an optional func argument for providing a user-supplied binary function.

logging

The basicConfig() function now supports an optional handlers argument taking an iterable of handlers to be added to the root logger.

A class level attribute append_nul has been added to SysLogHandler to allow control of the appending of the NUL (\000) byte to syslog records, since for some daemons it is required while for others it is passed through to the log.

math

math 模块新增了一个函数 log2(),它返回 x 的以 2 为底的对数。

(由 Mark Dickinson 在 bpo-11888 中编写。)

mmap

现在 read() 方法能更好地兼容其他文件型对象:如果参数被省略或指定为 None,它将返回从当前文件位置到映射对象末尾的字节数据。 (由 Petri Lehtinen 在 bpo-12021 中贡献。)

multiprocessing

新增的 multiprocessing.connection.wait() 函数允许附带超时限制轮询多个对象(如连接、套接字和管道)。 (由 Richard Oudkerk 在 bpo-12328 中贡献。)

现在 multiprocessing.Connection 对象可通过多进程连接进行传输。 (由 Richard Oudkerk 在 bpo-4892 中贡献。)

现在 multiprocessing.Process 可接受 daemon 关键字参数来覆盖继承来自父进程的 daemon 旗标的默认行为 (bpo-6064)。

新增的属性 multiprocessing.Process.sentinel 允许程序使用适当的 OS 原语来同时等待多个 Process 对象 (例如,在 posix 系统上是使用 select )。

新增的方法 multiprocessing.pool.Pool.starmap()starmap_async() 提供了针对现有 multiprocessing.pool.Pool.map()map_async() 函数的 itertools.starmap() 对应物。 (由 Hynek Schlawack 在 bpo-12708 中贡献。)

nntplib

现在 nntplib.NNTP 类支持上下文管理协议以无条件地处理 socket.error 异常并在完成后关闭 NNTP 连接:

>>> from nntplib import NNTP
>>> with NNTP('news.gmane.org') as n:
...     n.group('gmane.comp.python.committers')
...
('211 1755 1 1755 gmane.comp.python.committers', 1755, 1, 1755, 'gmane.comp.python.committers')
>>>

(由 Giampaolo Rodolà 在 bpo-9795 中贡献。)

os

  • os 模块新增了 pipe2() 函数以便能够创建会自动设置 O_CLOEXECO_NONBLOCK 旗标的管道。 这特别适用于避免多线程的程序出现竞争条件。

  • The os module has a new sendfile() function which provides an efficient "zero-copy" way for copying data from one file (or socket) descriptor to another. The phrase "zero-copy" refers to the fact that all of the copying of data between the two descriptors is done entirely by the kernel, with no copying of data into userspace buffers. sendfile() can be used to efficiently copy data from a file on disk to a network socket, e.g. for downloading a file.

    (Patch submitted by Ross Lagerwall and Giampaolo Rodolà in bpo-10882.)

  • To avoid race conditions like symlink attacks and issues with temporary files and directories, it is more reliable (and also faster) to manipulate file descriptors instead of file names. Python 3.3 enhances existing functions and introduces new functions to work on file descriptors (bpo-4761, bpo-10755 and bpo-14626).

  • access() accepts an effective_ids keyword argument to turn on using the effective uid/gid rather than the real uid/gid in the access check. Platform support for this can be checked via the supports_effective_ids set.

  • The os module has two new functions: getpriority() and setpriority(). They can be used to get or set process niceness/priority in a fashion similar to os.nice() but extended to all processes instead of just the current one.

    (Patch submitted by Giampaolo Rodolà in bpo-10784.)

  • The new os.replace() function allows cross-platform renaming of a file with overwriting the destination. With os.rename(), an existing destination file is overwritten under POSIX, but raises an error under Windows. (Contributed by Antoine Pitrou in bpo-8828.)

  • The stat family of functions (stat(), fstat(), and lstat()) now support reading a file's timestamps with nanosecond precision. Symmetrically, utime() can now write file timestamps with nanosecond precision. (Contributed by Larry Hastings in bpo-14127.)

  • The new os.get_terminal_size() function queries the size of the terminal attached to a file descriptor. See also shutil.get_terminal_size(). (Contributed by Zbigniew Jędrzejewski-Szmek in bpo-13609.)

pdb

Tab 补全现在不仅适用于命令名称,也适用于它们的参数。 例如,对于 break 命令,函数和文件名将被补全。

(由 Georg Brandl 在 bpo-14210 中贡献)

pickle

现在 pickle.Pickler 对象具有一个可选的 dispatch_table 属性以允许针对每个 pickler 设置缩减函数。

(由 Richard Oudkerk 在 bpo-14166 中贡献。)

pydoc

Tk GUI 和 serve() 函数已从 pydoc 模块中被移除: pydoc -gserve() 在 Python 3.2 中已被弃用。

re

现在 str 正则表达式已支持 \u\U 转义符。

(由 Serhiy Storchaka 在 bpo-3665 中贡献。)

sched

  • run() now accepts a blocking parameter which when set to false makes the method execute the scheduled events due to expire soonest (if any) and then return immediately. This is useful in case you want to use the scheduler in non-blocking applications. (Contributed by Giampaolo Rodolà in bpo-13449.)

  • scheduler class can now be safely used in multi-threaded environments. (Contributed by Josiah Carlson and Giampaolo Rodolà in bpo-8684.)

  • timefunc and delayfunct parameters of scheduler class constructor are now optional and defaults to time.time() and time.sleep() respectively. (Contributed by Chris Clark in bpo-13245.)

  • enter() and enterabs() argument parameter is now optional. (Contributed by Chris Clark in bpo-13245.)

  • enter() and enterabs() now accept a kwargs parameter. (Contributed by Chris Clark in bpo-13245.)

select

Solaris and derivative platforms have a new class select.devpoll for high performance asynchronous sockets via /dev/poll. (Contributed by Jesús Cea Avión in bpo-6397.)

shlex

The previously undocumented helper function quote from the pipes modules has been moved to the shlex module and documented. quote() properly escapes all characters in a string that might be otherwise given special meaning by the shell.

shutil

  • 新的函数:

    • disk_usage(): provides total, used and free disk space statistics. (Contributed by Giampaolo Rodolà in bpo-12442.)

    • chown(): allows one to change user and/or group of the given path also specifying the user/group names and not only their numeric ids. (Contributed by Sandro Tosi in bpo-12191.)

    • shutil.get_terminal_size(): returns the size of the terminal window to which the interpreter is attached. (Contributed by Zbigniew Jędrzejewski-Szmek in bpo-13609.)

  • copy2() and copystat() now preserve file timestamps with nanosecond precision on platforms that support it. They also preserve file "extended attributes" on Linux. (Contributed by Larry Hastings in bpo-14127 and bpo-15238.)

  • Several functions now take an optional symlinks argument: when that parameter is true, symlinks aren't dereferenced and the operation instead acts on the symlink itself (or creates one, if relevant). (Contributed by Hynek Schlawack in bpo-12715.)

  • When copying files to a different file system, move() now handles symlinks the way the posix mv command does, recreating the symlink rather than copying the target file contents. (Contributed by Jonathan Niehof in bpo-9993.) move() now also returns the dst argument as its result.

  • rmtree() is now resistant to symlink attacks on platforms which support the new dir_fd parameter in os.open() and os.unlink(). (Contributed by Martin von Löwis and Hynek Schlawack in bpo-4489.)

signal

smtpd

The smtpd module now supports RFC 5321 (extended SMTP) and RFC 1870 (size extension). Per the standard, these extensions are enabled if and only if the client initiates the session with an EHLO command.

(Initial ELHO support by Alberto Trevino. Size extension by Juhana Jauhiainen. Substantial additional work on the patch contributed by Michele Orrù and Dan Boswell. bpo-8739)

smtplib

The SMTP, SMTP_SSL, and LMTP classes now accept a source_address keyword argument to specify the (host, port) to use as the source address in the bind call when creating the outgoing socket. (Contributed by Paulo Scardine in bpo-11281.)

SMTP now supports the context management protocol, allowing an SMTP instance to be used in a with statement. (Contributed by Giampaolo Rodolà in bpo-11289.)

The SMTP_SSL constructor and the starttls() method now accept an SSLContext parameter to control parameters of the secure channel. (Contributed by Kasun Herath in bpo-8809.)

socket

socketserver

BaseServer now has an overridable method service_actions() that is called by the serve_forever() method in the service loop. ForkingMixIn now uses this to clean up zombie child processes. (Contributed by Justin Warkentin in bpo-11109.)

sqlite3

新增的 sqlite3.Connection 方法 set_trace_callback() 可被用于捕获由 sqlite 处理的所有 sql 命令的追踪信息。 (由 Torsten Landschoff 在 bpo-11688 中贡献。)

ssl

  • ssl 新增了两个随机生成函数:

    • RAND_bytes(): 生成高加密强度的伪随机字节数据。

    • RAND_pseudo_bytes(): 生成伪随机字节。

    (由 Victor Stinner 在 bpo-12049 中贡献。)

  • The ssl module now exposes a finer-grained exception hierarchy in order to make it easier to inspect the various kinds of errors. (Contributed by Antoine Pitrou in bpo-11183.)

  • load_cert_chain() now accepts a password argument to be used if the private key is encrypted. (Contributed by Adam Simpkins in bpo-12803.)

  • Diffie-Hellman key exchange, both regular and Elliptic Curve-based, is now supported through the load_dh_params() and set_ecdh_curve() methods. (Contributed by Antoine Pitrou in bpo-13626 and bpo-13627.)

  • SSL sockets have a new get_channel_binding() method allowing the implementation of certain authentication mechanisms such as SCRAM-SHA-1-PLUS. (Contributed by Jacek Konieczny in bpo-12551.)

  • You can query the SSL compression algorithm used by an SSL socket, thanks to its new compression() method. The new attribute OP_NO_COMPRESSION can be used to disable compression. (Contributed by Antoine Pitrou in bpo-13634.)

  • Support has been added for the Next Protocol Negotiation extension using the ssl.SSLContext.set_npn_protocols() method. (Contributed by Colin Marc in bpo-14204.)

  • SSL errors can now be introspected more easily thanks to library and reason attributes. (Contributed by Antoine Pitrou in bpo-14837.)

  • The get_server_certificate() function now supports IPv6. (Contributed by Charles-François Natali in bpo-11811.)

  • New attribute OP_CIPHER_SERVER_PREFERENCE allows setting SSLv3 server sockets to use the server's cipher ordering preference rather than the client's (bpo-13635).

stat

The undocumented tarfile.filemode function has been moved to stat.filemode(). It can be used to convert a file's mode to a string of the form '-rwxrwxrwx'.

(由 Giampaolo Rodolà 在 bpo-14807 中贡献。)

struct

The struct module now supports ssize_t and size_t via the new codes n and N, respectively. (Contributed by Antoine Pitrou in bpo-3163.)

subprocess

Command strings can now be bytes objects on posix platforms. (Contributed by Victor Stinner in bpo-8513.)

A new constant DEVNULL allows suppressing output in a platform-independent fashion. (Contributed by Ross Lagerwall in bpo-5870.)

sys

The sys module has a new thread_info named tuple holding information about the thread implementation (bpo-11223).

tarfile

tarfile now supports lzma encoding via the lzma module. (Contributed by Lars Gustäbel in bpo-5689.)

tempfile

tempfile.SpooledTemporaryFile's truncate() method now accepts a size parameter. (Contributed by Ryan Kelly in bpo-9957.)

textwrap

The textwrap module has a new indent() that makes it straightforward to add a common prefix to selected lines in a block of text (bpo-13857).

threading

threading.Condition, threading.Semaphore, threading.BoundedSemaphore, threading.Event, and threading.Timer, all of which used to be factory functions returning a class instance, are now classes and may be subclassed. (Contributed by Éric Araujo in bpo-10968.)

The threading.Thread constructor now accepts a daemon keyword argument to override the default behavior of inheriting the daemon flag value from the parent thread (bpo-6064).

The formerly private function _thread.get_ident is now available as the public function threading.get_ident(). This eliminates several cases of direct access to the _thread module in the stdlib. Third party code that used _thread.get_ident should likewise be changed to use the new public interface.

time

The PEP 418 added new functions to the time module:

  • get_clock_info(): Get information on a clock.

  • monotonic(): Monotonic clock (cannot go backward), not affected by system clock updates.

  • perf_counter(): Performance counter with the highest available resolution to measure a short duration.

  • process_time(): Sum of the system and user CPU time of the current process.

Other new functions:

To improve cross platform consistency, sleep() now raises a ValueError when passed a negative sleep value. Previously this was an error on posix, but produced an infinite sleep on Windows.

types

Add a new types.MappingProxyType class: Read-only proxy of a mapping. (bpo-14386)

The new functions types.new_class() and types.prepare_class() provide support for PEP 3115 compliant dynamic type creation. (bpo-14588)

unittest

assertRaises(), assertRaisesRegex(), assertWarns(), and assertWarnsRegex() now accept a keyword argument msg when used as context managers. (Contributed by Ezio Melotti and Winston Ewert in bpo-10775.)

unittest.TestCase.run() now returns the TestResult object.

urllib

The Request class, now accepts a method argument used by get_method() to determine what HTTP method should be used. For example, this will send a 'HEAD' request:

>>> urlopen(Request('https://www.pyth.onl', method='HEAD'))

(bpo-1673007)

webbrowser

The webbrowser module supports more "browsers": Google Chrome (named chrome, chromium, chrome-browser or chromium-browser depending on the version and operating system), and the generic launchers xdg-open, from the FreeDesktop.org project, and gvfs-open, which is the default URI handler for GNOME 3. (The former contributed by Arnaud Calmettes in bpo-13620, the latter by Matthias Klose in bpo-14493.)

xml.etree.ElementTree

The xml.etree.ElementTree module now imports its C accelerator by default; there is no longer a need to explicitly import xml.etree.cElementTree (this module stays for backwards compatibility, but is now deprecated). In addition, the iter family of methods of Element has been optimized (rewritten in C). The module's documentation has also been greatly improved with added examples and a more detailed reference.

zlib

New attribute zlib.Decompress.eof makes it possible to distinguish between a properly formed compressed stream and an incomplete or truncated one. (Contributed by Nadeem Vawda in bpo-12646.)

New attribute zlib.ZLIB_RUNTIME_VERSION reports the version string of the underlying zlib library that is loaded at runtime. (Contributed by Torsten Landschoff in bpo-12306.)

性能优化

已增加的主要性能改善:

  • 得益于:pep:393 ,Unicode 字符串的某些操作已得到优化:

    • the memory footprint is divided by 2 to 4 depending on the text

    • 将 ASCII 字符串编码为 UTF-8 不再需要对字符进行编码,UTF-8 的表示法与 ASCII 的表示法是共享的

    • the UTF-8 encoder has been optimized

    • repeating a single ASCII letter and getting a substring of an ASCII string is 4 times faster

  • UTF-8 编码现在快 2 到 4 倍。 UTF-16 编码的速度现在提高了 10 倍。

    (由 Serhiy Storchaka 在 bpo-14624, bpo-14738bpo-15026 中贡献)。

构建和 C API 的改变

针对 Python 构建过程和 C API 的改变包括:

弃用

不支持的操作系统

由于缺少维护人员,不再支持 OS/2 和 VMS 系统 。

由于维护负担,将 COMSPEC 设置为 command.com 的 Windows平台(含Windows 2000)不再受支持。

OSF支持在3.2中被弃用,现在已经被完全删除。

已弃用的 Python 模块、函数和方法

已弃用的 C API 函数和类型

Py_UNICODE 已经在 PEP 393 弃用,并将于 Python 4 中移除。所有使用此类型的函数都已弃用:

使用 Py_UNICODEPy_UNICODE* 类型的 Unicode 函数和方法:

Functions and macros manipulating Py_UNICODE* strings:

编码器:

弃用的功能

array 模块的``'u'`` 格式代码现已弃用,将在 Python 4 中与 (Py_UNICODE) API 的其他部分一起删除。

移植到 Python 3.3

本节列出了先前描述的更改以及可能需要更改代码的其他错误修正.

移植 Python 代码

  • 默认启用哈希随机化。 将 PYTHONHASHSEED 环境变量设为 0 可禁用哈希随机化。 另请参阅 object.__hash__() 方法。

  • bpo-12326: 在 Linux 上,sys.platform 不再包含主要版本。现在它始终是 "linux",而不是 "linux2" 或 "linux3",这取决于用于构建 Python 的 Linux 版本。请用 sys.platform.startswith('linux') 替换 sys.platform == 'linux2',如果不需要支持较旧的 Python 版本,则可直接替换成 sys.platform == 'linux'。

  • bpo-13847, bpo-14180: timedatetime: 现在如果时间戳超出范围将会引发 OverflowError 而不是 ValueError。 现在如果 C 函数 gmtime()localtime() 失败 将会引发 OSError

  • The default finders used by import now utilize a cache of what is contained within a specific directory. If you create a Python source file or sourceless bytecode file, make sure to call importlib.invalidate_caches() to clear out the cache for the finders to notice the new file.

  • ImportError now uses the full name of the module that was attempted to be imported. Doctests that check ImportErrors' message will need to be updated to use the full name of the module instead of just the tail of the name.

  • The index argument to __import__() now defaults to 0 instead of -1 and no longer support negative values. It was an oversight when PEP 328 was implemented that the default value remained -1. If you need to continue to perform a relative import followed by an absolute import, then perform the relative import using an index of 1, followed by another import using an index of 0. It is preferred, though, that you use importlib.import_module() rather than call __import__() directly.

  • __import__() no longer allows one to use an index value other than 0 for top-level modules. E.g. __import__('sys', level=1) is now an error.

  • Because sys.meta_path and sys.path_hooks now have finders on them by default, you will most likely want to use list.insert() instead of list.append() to add to those lists.

  • Because None is now inserted into sys.path_importer_cache, if you are clearing out entries in the dictionary of paths that do not have a finder, you will need to remove keys paired with values of None and imp.NullImporter to be backwards-compatible. This will lead to extra overhead on older versions of Python that re-insert None into sys.path_importer_cache where it represents the use of implicit finders, but semantically it should not change anything.

  • importlib.abc.Finder no longer specifies a find_module() abstract method that must be implemented. If you were relying on subclasses to implement that method, make sure to check for the method's existence first. You will probably want to check for find_loader() first, though, in the case of working with path entry finders.

  • pkgutil has been converted to use importlib internally. This eliminates many edge cases where the old behaviour of the PEP 302 import emulation failed to match the behaviour of the real import system. The import emulation itself is still present, but is now deprecated. The pkgutil.iter_importers() and pkgutil.walk_packages() functions special case the standard import hooks so they are still supported even though they do not provide the non-standard iter_modules() method.

  • A longstanding RFC-compliance bug (bpo-1079) in the parsing done by email.header.decode_header() has been fixed. Code that uses the standard idiom to convert encoded headers into unicode (str(make_header(decode_header(h))) will see no change, but code that looks at the individual tuples returned by decode_header will see that whitespace that precedes or follows ASCII sections is now included in the ASCII section. Code that builds headers using make_header should also continue to work without change, since make_header continues to add whitespace between ASCII and non-ASCII sections if it is not already present in the input strings.

  • email.utils.formataddr() now does the correct content transfer encoding when passed non-ASCII display names. Any code that depended on the previous buggy behavior that preserved the non-ASCII unicode in the formatted output string will need to be changed (bpo-1690608).

  • poplib.POP3.quit() may now raise protocol errors like all other poplib methods. Code that assumes quit does not raise poplib.error_proto errors may need to be changed if errors on quit are encountered by a particular application (bpo-11291).

  • The strict argument to email.parser.Parser, deprecated since Python 2.4, has finally been removed.

  • The deprecated method unittest.TestCase.assertSameElements has been removed.

  • The deprecated variable time.accept2dyear has been removed.

  • 被弃用的 Context._clamp 属性已从 decimal 模块中移除。 在此之前它已被公有属性 clamp 取代。 (参见 bpo-8540。)

  • The undocumented internal helper class SSLFakeFile has been removed from smtplib, since its functionality has long been provided directly by socket.socket.makefile().

  • Passing a negative value to time.sleep() on Windows now raises an error instead of sleeping forever. It has always raised an error on posix.

  • The ast.__version__ constant has been removed. If you need to make decisions affected by the AST version, use sys.version_info to make the decision.

  • Code that used to work around the fact that the threading module used factory functions by subclassing the private classes will need to change to subclass the now-public classes.

  • The undocumented debugging machinery in the threading module has been removed, simplifying the code. This should have no effect on production code, but is mentioned here in case any application debug frameworks were interacting with it (bpo-13550).

移植 C 代码

  • In the course of changes to the buffer API the undocumented smalltable member of the Py_buffer structure has been removed and the layout of the PyMemoryViewObject has changed.

    All extensions relying on the relevant parts in memoryobject.h or object.h must be rebuilt.

  • Due to PEP 393, the Py_UNICODE type and all functions using this type are deprecated (but will stay available for at least five years). If you were using low-level Unicode APIs to construct and access unicode objects and you want to benefit of the memory footprint reduction provided by PEP 393, you have to convert your code to the new Unicode API.

    However, if you only have been using high-level functions such as PyUnicode_Concat(), PyUnicode_Join() or PyUnicode_FromFormat(), your code will automatically take advantage of the new unicode representations.

  • PyImport_GetMagicNumber() now returns -1 upon failure.

  • As a negative value for the level argument to __import__() is no longer valid, the same now holds for PyImport_ImportModuleLevel(). This also means that the value of level used by PyImport_ImportModuleEx() is now 0 instead of -1.

Building C extensions

  • The range of possible file names for C extensions has been narrowed. Very rarely used spellings have been suppressed: under POSIX, files named xxxmodule.so, xxxmodule.abi3.so and xxxmodule.cpython-*.so are no longer recognized as implementing the xxx module. If you had been generating such files, you have to switch to the other spellings (i.e., remove the module string from the file names).

    (在 bpo-14040 中实现。)

命令行开关的变化

  • 删除了 -Q命令-line旗标 和相关工具。 检查 sys.flags.division_warning 的代码需要更新。

    bpo-10998,由 Éric Araujo 贡献。)

  • python 附带 -S 启动时,import site 将不再向模块搜索路径添加 site 专属路径。 在之前版本中则会这样做。

    (由 Carl Meyer 在 bpo-11591 中贡献并由 Éric Araujo 进行修改。)