使用卷积神经网络 (CNN) 处理复杂图像

剩余时间：56 分钟

关于此 Codelab

Laurence Moroney 编写

1. 准备工作

在此 Codelab 中，您将使用卷积对马和人的图像进行分类。在本实验中，您将使用 TensorFlow 创建一个 CNN，该 CNN 经过训练可识别马和人的图像，并对其进行分类。

前提条件

如果您之前从未使用 TensorFlow 构建过卷积，可能需要完成构建卷积并执行池化 Codelab（我们在其中介绍了卷积和池化），以及构建卷积神经网络 (CNN) 以增强计算机视觉 Codelab（我们在其中探讨了如何使计算机更加高效地识别图像）。

学习内容

如何训练计算机识别图像中不清晰物体的特征

您将构建的内容

卷积神经网络，可区分马和人的照片

所需条件

您可以找到在 Colab 中运行其余 Codelab 的代码。

您还需要安装 TensorFlow 以及您在上一个 Codelab 中安装的库。

2. 使用入门：获取数据

您可以通过构建一个 horses-or-humans 分类器实现这一功能，用于识别给定图像是否包含马或人，您需要训练此网络识别马与人的特征。您必须先对数据做一些处理，然后才能进行训练。

首先，下载数据：

!wget --no-check-certificate https://storage.googleapis.com/laurencemoroney-blog.appspot.com/horse-or-human.zip  -O /tmp/horse-or-human.zip

以下 Python 代码将导入 OS 库来使用各操作系统库，使您能够访问文件系统和 ZipFile 库，以便解压缩数据。

import os
import zipfile

local_zip = '/tmp/horse-or-human.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('/tmp/horse-or-human')
zip_ref.close()

该 ZIP 文件的内容将解压缩到包含马和人子目录的基础目录 /tmp/horse-or-human 中。

简而言之，训练集中包含的数据会告诉神经网络模型“这是马的样子”和“这是人的样子”。

3. 使用 ImageGenerator 标记和准备数据

您并不需要明确地将图像标记为马或人。

稍后，您会看到系统使用了名为 ImageDataGenerator 的程序。它会读取子目录中的图像，并根据子目录的名称自动标记这些图像。例如，您的训练目录包含一个马目录和一个人目录。ImageDataGenerator 会为图像添加合适的标签，从而减少编码步骤。

定义每个目录。

# Directory with our training horse pictures
train_horse_dir = os.path.join('/tmp/horse-or-human/horses')

# Directory with our training human pictures
train_human_dir = os.path.join('/tmp/horse-or-human/humans')

现在，查看马和人训练目录中的文件名是什么样子：

train_horse_names = os.listdir(train_horse_dir)
print(train_horse_names[:10])
train_human_names = os.listdir(train_human_dir)
print(train_human_names[:10])

查看目录中马和人的图像总数：

print('total training horse images:', len(os.listdir(train_horse_dir)))
print('total training human images:', len(os.listdir(train_human_dir)))

5. 定义模型

开始定义模型。

首先导入 TensorFlow：

import tensorflow as tf

然后，添加卷积层并扁平化最终结果，将其馈送到密集连接层。最后，添加密集连接层。

注意：由于您面对的是两类分类问题（即，二元分类问题），您的网络最终会以 sigmoid 激活函数结束，使得网络的输出为 0 到 1 之间的单个标量，表示当前图像为第 1 类（而非第 0 类）的概率。

model = tf.keras.models.Sequential([
    # Note the input shape is the desired size of the image 300x300 with 3 bytes color
    # This is the first convolution
    tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(300, 300, 3)),
    tf.keras.layers.MaxPooling2D(2, 2),
    # The second convolution
    tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    # The third convolution
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    # The fourth convolution
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    # The fifth convolution
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    # Flatten the results to feed into a DNN
    tf.keras.layers.Flatten(),
    # 512 neuron hidden layer
    tf.keras.layers.Dense(512, activation='relu'),
    # Only 1 output neuron. It will contain a value from 0-1 where 0 for 1 class ('horses') and 1 for the other ('humans')
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.summary() 方法调用会输出网络的摘要。

model.summary()

您会看到如下所示的结果：

Layer (type)                 Output Shape              Param #
=================================================================
conv2d (Conv2D)              (None, 298, 298, 16)      448
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 149, 149, 16)      0
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 147, 147, 32)      4640
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 73, 73, 32)        0
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 71, 71, 64)        18496
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 35, 35, 64)        0
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 33, 33, 64)        36928
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 16, 16, 64)        0
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 14, 14, 64)        36928
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 7, 7, 64)          0
_________________________________________________________________
flatten (Flatten)            (None, 3136)              0
_________________________________________________________________
dense (Dense)                (None, 512)               1606144
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 513
=================================================================
Total params: 1,704,097
Trainable params: 1,704,097
Non-trainable params: 0

输出形状列显示了特征图的大小在每个连续层中的变化情况。由于填充操作，卷积层会稍微减小特征图的大小，而每个池化层则将特征图的大小减半。

6. 编译模型

接下来，配置模型训练的规范。用 binary_crossentropy 损失函数训练模型，因为它是二元分类问题，而最终激活函数是 sigmoid。（若要复习损失指标，请参阅深入了解机器学习。）使用 rmsprop 优化器，学习速率为 0.001。在训练期间，监控分类准确率。

from tensorflow.keras.optimizers import RMSprop

model.compile(loss='binary_crossentropy',
              optimizer=RMSprop(lr=0.001),
              metrics=['acc'])

7. 从生成器训练模型

设置数据生成器，用于读取源文件夹中的照片，将照片转换为 float32 张量，并将照片（及其标签）馈送到您的网络中。

您将拥有一个用于训练图像的生成器，以及一个用于验证图像的生成器。您的生成器将批量生成大小为 300x300 的图像及其二元标签。

您可能已经知道，进入神经网络的数据通常应该以某种方式归一化，以更易于网络处理。（向 CNN 提供原始像素的情况并不常见。）在此 Codelab 中，预处理图像的方式是：将像素值归一化到 [0, 1] 范围内（最初所有值都在 [0, 255] 范围内）。

在 Keras 中，可通过使用重新缩放参数的 keras.preprocessing.image.ImageDataGenerator 类来实现此目的。借助 ImageDataGenerator 类，您可以通过 .flow(data, labels) 或 .flow_from_directory(directory) 实例化会生成增强批量图像（及其标签）的生成器。然后，您可以将这些生成器与接受数据生成器作为输入的 Keras 模型方法结合使用：fit_generator、evaluate_generator 和 predict_generator。

from tensorflow.keras.preprocessing.image import ImageDataGenerator

# All images will be rescaled by 1./255
train_datagen = ImageDataGenerator(rescale=1./255)

# Flow training images in batches of 128 using train_datagen generator
train_generator = train_datagen.flow_from_directory(
        '/tmp/horse-or-human/',  # This is the source directory for training images
        target_size=(300, 300),  # All images will be resized to 150x150
        batch_size=128,
        # Since we use binary_crossentropy loss, we need binary labels
        class_mode='binary')

8. 进行训练

训练 15 个周期。（可能需要几分钟的时间运行。）

history = model.fit(
      train_generator,
      steps_per_epoch=8,
      epochs=15,
      verbose=1)

注意每个周期的值。

损失和准确率是训练进度的重要标志。模型会猜测训练数据的分类情况，然后根据已知标签对其进行衡量并计算结果。准确率表示正确猜测所占的比例。

Epoch 1/15
9/9 [==============================] - 9s 1s/step - loss: 0.8662 - acc: 0.5151
Epoch 2/15
9/9 [==============================] - 8s 927ms/step - loss: 0.7212 - acc: 0.5969
Epoch 3/15
9/9 [==============================] - 8s 921ms/step - loss: 0.6612 - acc: 0.6592
Epoch 4/15
9/9 [==============================] - 8s 925ms/step - loss: 0.3135 - acc: 0.8481
Epoch 5/15
9/9 [==============================] - 8s 919ms/step - loss: 0.4640 - acc: 0.8530
Epoch 6/15
9/9 [==============================] - 8s 896ms/step - loss: 0.2306 - acc: 0.9231
Epoch 7/15
9/9 [==============================] - 8s 915ms/step - loss: 0.1464 - acc: 0.9396
Epoch 8/15
9/9 [==============================] - 8s 935ms/step - loss: 0.2663 - acc: 0.8919
Epoch 9/15
9/9 [==============================] - 8s 883ms/step - loss: 0.0772 - acc: 0.9698
Epoch 10/15
9/9 [==============================] - 9s 951ms/step - loss: 0.0403 - acc: 0.9805
Epoch 11/15
9/9 [==============================] - 8s 891ms/step - loss: 0.2618 - acc: 0.9075
Epoch 12/15
9/9 [==============================] - 8s 902ms/step - loss: 0.0434 - acc: 0.9873
Epoch 13/15
9/9 [==============================] - 8s 904ms/step - loss: 0.0187 - acc: 0.9932
Epoch 14/15
9/9 [==============================] - 9s 951ms/step - loss: 0.0974 - acc: 0.9649
Epoch 15/15
9/9 [==============================] - 8s 877ms/step - loss: 0.2859 - acc: 0.9338

9. 测试模型

现在，使用该模型进行实际预测。下方的代码将使您能够从文件系统中选择一个或多个文件。然后，代码将上传这些文件并将文件传入模型中，最终指明对象是马还是人。

您可以将互联网中的图像下载到文件系统中，试一试！请注意，尽管训练准确率高于 99%，但您可能会发现神经网络会犯许多错误。

这是由于所谓的过拟合造成的，即神经网络使用非常有限的数据进行训练（每个类别只有大约 500 张图像）。因此，它非常擅长识别与训练集中的图像相似的图像，但对于训练集中不包含的图像，错误率会很高。

这是一个数据点，证明您训练的数据越多，最终网络将会越出色！

尽管数据有限，但有许多技术可以用来改善训练，包括图像增强技术，但这超出了此 Codelab 的讨论范围。

import numpy as np
from google.colab import files
from keras.preprocessing import image

uploaded = files.upload()

for fn in uploaded.keys():

  # predicting images
  path = '/content/' + fn
  img = image.load_img(path, target_size=(300, 300))
  x = image.img_to_array(img)
  x = np.expand_dims(x, axis=0)

  images = np.vstack([x])
  classes = model.predict(images, batch_size=10)
  print(classes[0])
  if classes[0]>0.5:
    print(fn + " is a human")
  else:
    print(fn + " is a horse")

例如，假设您要使用以下图像进行测试：

Colab 产生的结果如下：

尽管它是卡通图片，仍可以正确分类。

下图也可以正确分类：

尝试一些您自己的图像，一探究竟！

10. 直观呈现中间表示法

如需感受 CNN 已经了解了哪些特征类型，有趣的做法是可视化输入在 CNN 中的转换方式。

从训练集中选取随机图像，然后生成一个图表，其中每一行都是层的输出，而行中的每个图像都是该输出特征图中的特定过滤器。重新运行该单元格，以生成各种训练图像的中间表示法。

import numpy as np
import random
from tensorflow.keras.preprocessing.image import img_to_array, load_img

# Let's define a new Model that will take an image as input, and will output
# intermediate representations for all layers in the previous model after
# the first.
successive_outputs = [layer.output for layer in model.layers[1:]]
#visualization_model = Model(img_input, successive_outputs)
visualization_model = tf.keras.models.Model(inputs = model.input, outputs = successive_outputs)
# Let's prepare a random input image from the training set.
horse_img_files = [os.path.join(train_horse_dir, f) for f in train_horse_names]
human_img_files = [os.path.join(train_human_dir, f) for f in train_human_names]
img_path = random.choice(horse_img_files + human_img_files)

img = load_img(img_path, target_size=(300, 300))  # this is a PIL image
x = img_to_array(img)  # Numpy array with shape (150, 150, 3)
x = x.reshape((1,) + x.shape)  # Numpy array with shape (1, 150, 150, 3)

# Rescale by 1/255
x /= 255

# Let's run our image through our network, thus obtaining all
# intermediate representations for this image.
successive_feature_maps = visualization_model.predict(x)

# These are the names of the layers, so can have them as part of our plot
layer_names = [layer.name for layer in model.layers]

# Now let's display our representations
for layer_name, feature_map in zip(layer_names, successive_feature_maps):
  if len(feature_map.shape) == 4:
    # Just do this for the conv / maxpool layers, not the fully-connected layers
    n_features = feature_map.shape[-1]  # number of features in feature map
    # The feature map has shape (1, size, size, n_features)
    size = feature_map.shape[1]
    # We will tile our images in this matrix
    display_grid = np.zeros((size, size * n_features))
    for i in range(n_features):
      # Postprocess the feature to make it visually palatable
      x = feature_map[0, :, :, i]
      x -= x.mean()
      if x.std()>0:
        x /= x.std()
      x *= 64
      x += 128
      x = np.clip(x, 0, 255).astype('uint8')
      # We'll tile each filter into this big horizontal grid
      display_grid[:, i * size : (i + 1) * size] = x
    # Display the grid
    scale = 20. / n_features
    plt.figure(figsize=(scale * n_features, scale))
    plt.title(layer_name)
    plt.grid(False)
    plt.imshow(display_grid, aspect='auto', cmap='viridis')

结果示例如下：

如您所见，您从图像的原始像素过渡到越来越抽象且精简的表示法。随着学习的深入，表示法开始突出显示网络所关注的内容，且显示“激活”的特征越来越少。大多数都设置为 0。我们称之为“稀疏性”。表示法稀疏性是深度学习的主要功能。

这些表示法包含的有关图像原始像素的信息日益减少，但关于图像类别的信息则日益精细。您可以将 CNN（或通常所说的深度网络）视为信息蒸馏管道。

11. 恭喜

您了解了如何使用 CNN 增强复杂图像。如需了解如何进一步增强计算机视觉模型，请继续学习使用大型数据集训练卷积神经网络 (CNN)，以免过拟合这个 Codelab。

报告错误