此页面由 Cloud Translation API 翻译。

Python 实用程序

在本节中，我们将了解 Python 的许多标准实用程序模块中的几个来解决常见问题。

文件系统 - os、os.path、Shutil

*os* 和 *os.path* 模块包含许多与文件系统交互的函数。*shutil* 模块可以复制文件。

操作系统模块文档
filenames = os.listdir(dir) - 该目录路径下的文件名列表（不包括 .和 ..)。文件名只是目录中的名称，而不是其绝对路径。
os.path.join(dir, filename) - 给定上述列表中的一个文件名，使用此方法将 dir 和文件名组合在一起形成路径
os.path.abspath(path) - 给定路径，返回绝对形式，例如/home/nick/foo/bar.html
os.path.dirname(path), os.path.basename(path) -- 给定 dir/foo/bar.html，返回目录名称“dir/foo”且基本名称为“bar.html”
os.path.exists(path) -- 如果存在，则为 true
os.mkdir(dir_path) - 创建一个目录，os.makedirs(dir_path) 会创建此路径中所有需要的目录
shortcuts.copy(source-path, dest-path) - 复制文件（应存在目标路径目录）

## Example pulls filenames from a dir, prints their relative and absolute paths
def printdir(dir):
  filenames = os.listdir(dir)
  for filename in filenames:
    print(filename)  ## foo.txt
    print(os.path.join(dir, filename)) ## dir/foo.txt (relative to current dir)
    print(os.path.abspath(os.path.join(dir, filename))) ## /home/nick/dir/foo.txt

使用内置的 python help() 和 dir() 函数可以很好地探索模块。在解释器中执行“import os”，然后使用这些命令查看模块中提供的内容：dir(os)、help(os.listdir)、dir(os.path)、help(os.path.dirname)。

运行外部进程 - 子进程

*子进程* 模块是运行外部命令和捕获其输出的一种简单方式。

子进程模块文档
output = subprocess.check_output(cmd, stderr=subprocess.STDOUT) - 运行命令，等待其退出，并返回其输出文本。运行该命令时，会将标准输出和标准错误合并为一个输出文本。如果调用失败，则会抛出 CalledProcessError。
如果您希望更好地控制子进程的运行，请参阅 subprocess.popen 类
此外，还有一个简单的 subprocess.call(cmd)，用于运行该命令，将其输出转储到输出中并返回其错误代码。如果您想运行该命令，但不需要将其输出捕获到 Python 数据结构中，这种方法是可行的。

import subprocess

## Given a dir path, run an external 'ls -l' on it --
## shows how to call an external program
def listdir(dir):
  cmd = 'ls -l ' + dir
  print("Command to run:", cmd)   ## good to debug cmd before actually running it
  (status, output) = subprocess.getstatusoutput(cmd)
  if status:    ## Error case, print the command's output to stderr and exit
    sys.stderr.write(output)
    sys.exit(status)
  print(output)  ## Otherwise do something with the command's output

异常

异常表示运行时错误，此类错误会在特定行停止正常执行，并将控制权转移给错误处理代码。本部分仅介绍了异常的最基本用途。例如，运行时错误可能是程序中使用的变量没有值（ValueError ...您可能见过几次这种情况），或文件打开操作错误（原因是文件不存在）(IOError)。如需了解详情，请参阅异常教程并查看完整的例外情况列表。

没有任何错误处理代码（就像我们到目前为止所做的那样），运行时异常会让程序停止运行，并显示错误消息。这是一种很好的默认行为，您已经见证过很多次了。您可以添加“尝试/除以下项”代码结构来处理异常，如下所示：

  try:
    ## Either of these two lines could throw an IOError, say
    ## if the file does not exist or the read() encounters a low level error.
    f = open(filename, 'rb')
    data = f.read()
    f.close()
  except IOError:
    ## Control jumps directly to here if any of the above lines throws IOError.
    sys.stderr.write('problem reading:' + filename)
  ## In any case, the code then continues with the line after the try/except

try: 部分包含可能会抛出异常的代码。else: 部分包含出现异常时要运行的代码。如果不存在任何异常，则系统会跳过除了：部分（即，该代码仅用于错误处理，不适用于代码的“正常”情况）。您可以使用语法“除了 IOError as e: ..”来获取指向异常对象本身的指针。（e 指向异常对象）。

HTTP - urllib 和 urlparse

模块 *urllib.request* 提供网址提取功能，使网址看起来像一个可供您读取的文件。*urlparse* 模块可以分离并整合网址。

urllib.request 模块文档
ufile = urllib.request.urlopen(url) -- 返回该网址的类似对象
text = ufile.read() - 可以像文件一样从中读取数据（readlines() 等也有效）
info = ufile.info() -- 相关请求的元信息。info.gettype() 是 MIME 类型，例如“text/html”
baseurl = ufile.geturl() -- 获取“base”请求的网址，由于重定向的原因，该网址可能与原始网址不同
urllib.request.urlretrieve(url, filename) - 将网址数据下载到指定文件路径
urllib.parse.urljoin(baseurl, url) - 给定一个（不一定是完整的）网址，以及它所来源网页的基准网址，返回完整网址。使用上面的 geturl() 提供基准网址。

所有异常都包含在 urllib.error 中。

from urllib.request import urlopen

## Given a url, try to retrieve it. If it's text/html,
## print its base url and its text.
def wget(url):
  ufile = urlopen(url)  ## get file-like object for url
  info = ufile.info()   ## meta-info about the url content
  if info.get_content_type() == 'text/html':
    print('base url:' + ufile.geturl())
    text = ufile.read()  ## read all its text
    print(text)

上述代码运行正常，但如果网址由于某种原因无法正常使用，则不包括错误处理。以下是函数版本，该函数添加了 try/while 逻辑，以便在网址操作失败时输出错误消息。

如果urlopen()似乎挂起，您的系统可能不允许直接访问某些 http 地址。您可以尝试使用 wget 提取相同的网址，或者 curl。如果这些程序也运行失败，则您需要通过代理提取 http 内容服务。本教程不介绍如何配置代理访问权限。

## Version that uses try/except to print an error message if the
## urlopen() fails.
def wget2(url):
  try:
    ufile = urlopen(url)
    if ufile.info().get_content_type() == 'text/html':
      print(ufile.read())
  except IOError:
    print('problem reading url:', url)

锻炼

如需练习文件系统和外部命令资料，请参阅复制特殊练习。要练习 urllib 资料，请参阅记录益智练习。