tensorflow

This commit is contained in:
Estom
2021-04-22 18:56:10 +08:00
parent c5a105fe3a
commit 88f91a24e9
551 changed files with 14356 additions and 48273 deletions

View File

@@ -1,69 +0,0 @@
# 初学者的 TensorFlow 2.0 教程
> 原文:[https://tensorflow.google.cn/tutorials/quickstart/beginner](https://tensorflow.google.cn/tutorials/quickstart/beginner)
**Note:** 我们的 TensorFlow 社区翻译了这些文档。因为社区翻译是尽力而为, 所以无法保证它们是最准确的,并且反映了最新的 [官方英文文档](https://tensorflow.google.cn/?hl=en)。如果您有改进此翻译的建议, 请提交 pull request 到 [tensorflow/docs](https://github.com/tensorflow/docs) GitHub 仓库。要志愿地撰写或者审核译文,请加入 [docs-zh-cn@tensorflow.org Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs-zh-cn)。
这是一个 [Google Colaboratory](https://colab.research.google.com/notebooks/welcome.ipynb) 笔记本文件。 Python 程序可以直接在浏览器中运行,这是学习 Tensorflow 的绝佳方式。想要学习该教程,请点击此页面顶部的按钮,在 Google Colab 中运行笔记本。
1. 在 Colab 中, 连接到 Python 运行环境: 在菜单条的右上方, 选择 *CONNECT*
2. 运行所有的代码块: 选择 *Runtime* > *Run all*
下载并安装 TensorFlow 2.0 测试版包。将 TensorFlow 载入你的程序:
```py
# 安装 TensorFlow
import tensorflow as tf
```
载入并准备好 [MNIST 数据集](http://yann.lecun.com/exdb/mnist/)。将样本从整数转换为浮点数:
```py
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
```
将模型的各层堆叠起来,以搭建 [`tf.keras.Sequential`](https://tensorflow.google.cn/api_docs/python/tf/keras/Sequential) 模型。为训练选择优化器和损失函数:
```py
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
```
训练并验证模型:
```py
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test, verbose=2)
```
```py
Epoch 1/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2962 - accuracy: 0.9155
Epoch 2/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.1420 - accuracy: 0.9581
Epoch 3/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.1064 - accuracy: 0.9672
Epoch 4/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0885 - accuracy: 0.9730
Epoch 5/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0749 - accuracy: 0.9765
313/313 - 0s - loss: 0.0748 - accuracy: 0.9778
[0.07484959065914154, 0.9778000116348267]
```
现在,这个照片分类器的准确度已经达到 98%。想要了解更多,请阅读 [TensorFlow 教程](https://tensorflow.google.cn/tutorials/)。

View File

@@ -1,148 +0,0 @@
# 针对专业人员的 TensorFlow 2.0 入门
> 原文:[https://tensorflow.google.cn/tutorials/quickstart/advanced](https://tensorflow.google.cn/tutorials/quickstart/advanced)
**Note:** 我们的 TensorFlow 社区翻译了这些文档。因为社区翻译是尽力而为, 所以无法保证它们是最准确的,并且反映了最新的 [官方英文文档](https://tensorflow.google.cn/?hl=en)。如果您有改进此翻译的建议, 请提交 pull request 到 [tensorflow/docs](https://github.com/tensorflow/docs) GitHub 仓库。要志愿地撰写或者审核译文,请加入 [docs-zh-cn@tensorflow.org Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs-zh-cn)。
这是一个 [Google Colaboratory](https://colab.research.google.com/notebooks/welcome.ipynb) 笔记本notebook文件。Python 程序直接在浏览器中运行——这是一种学习和使用 Tensorflow 的好方法。要学习本教程,请单击本页顶部按钮,在 Google Colab 中运行笔记本notebook.
1. 在 Colab 中,连接到 Python 运行时:在菜单栏右上角,选择*连接CONNECT*。
2. 运行所有笔记本notebook代码单元格选择*运行时Runtime* > *运行所有Run all*
下载并安装 TensorFlow 2.0 Beta 软件包:
将 Tensorflow 导入您的程序:
```py
import tensorflow as tf
from tensorflow.keras.layers import Dense, Flatten, Conv2D
from tensorflow.keras import Model
```
加载并准备 [MNIST 数据集](http://yann.lecun.com/exdb/mnist/)。
```py
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
# Add a channels dimension
x_train = x_train[..., tf.newaxis]
x_test = x_test[..., tf.newaxis]
```
使用 [`tf.data`](https://tensorflow.google.cn/api_docs/python/tf/data) 来将数据集切分为 batch 以及混淆数据集:
```py
train_ds = tf.data.Dataset.from_tensor_slices(
(x_train, y_train)).shuffle(10000).batch(32)
test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(32)
```
使用 Keras [模型子类化model subclassing API](https://tensorflow.google.cn/guide/keras#model_subclassing) 构建 [`tf.keras`](https://tensorflow.google.cn/api_docs/python/tf/keras) 模型:
```py
class MyModel(Model):
def __init__(self):
super(MyModel, self).__init__()
self.conv1 = Conv2D(32, 3, activation='relu')
self.flatten = Flatten()
self.d1 = Dense(128, activation='relu')
self.d2 = Dense(10, activation='softmax')
def call(self, x):
x = self.conv1(x)
x = self.flatten(x)
x = self.d1(x)
return self.d2(x)
model = MyModel()
```
为训练选择优化器与损失函数:
```py
loss_object = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.Adam()
```
选择衡量指标来度量模型的损失值loss和准确率accuracy。这些指标在 epoch 上累积值,然后打印出整体结果。
```py
train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')
test_loss = tf.keras.metrics.Mean(name='test_loss')
test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='test_accuracy')
```
使用 [`tf.GradientTape`](https://tensorflow.google.cn/api_docs/python/tf/GradientTape) 来训练模型:
```py
@tf.function
def train_step(images, labels):
with tf.GradientTape() as tape:
predictions = model(images)
loss = loss_object(labels, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_loss(loss)
train_accuracy(labels, predictions)
```
测试模型:
```py
@tf.function
def test_step(images, labels):
predictions = model(images)
t_loss = loss_object(labels, predictions)
test_loss(t_loss)
test_accuracy(labels, predictions)
```
```py
EPOCHS = 5
for epoch in range(EPOCHS):
# 在下一个 epoch 开始时,重置评估指标
train_loss.reset_states()
train_accuracy.reset_states()
test_loss.reset_states()
test_accuracy.reset_states()
for images, labels in train_ds:
train_step(images, labels)
for test_images, test_labels in test_ds:
test_step(test_images, test_labels)
template = 'Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, Test Accuracy: {}'
print (template.format(epoch+1,
train_loss.result(),
train_accuracy.result()*100,
test_loss.result(),
test_accuracy.result()*100))
```
```py
WARNING:tensorflow:Layer my_model is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2\. The layer has dtype float32 because its dtype defaults to floatx.
If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.
To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.
Epoch 1, Loss: 0.13825324177742004, Accuracy: 95.89166259765625, Test Loss: 0.07461485266685486, Test Accuracy: 97.47999572753906
Epoch 2, Loss: 0.04554400220513344, Accuracy: 98.61666870117188, Test Loss: 0.05126383528113365, Test Accuracy: 98.29000091552734
Epoch 3, Loss: 0.024927066639065742, Accuracy: 99.18500518798828, Test Loss: 0.05301696062088013, Test Accuracy: 98.30999755859375
Epoch 4, Loss: 0.014068767428398132, Accuracy: 99.52832794189453, Test Loss: 0.051672786474227905, Test Accuracy: 98.58000183105469
Epoch 5, Loss: 0.009344187565147877, Accuracy: 99.69166564941406, Test Loss: 0.06102905049920082, Test Accuracy: 98.25
```
该图片分类器现在在此数据集上训练得到了接近 98% 的准确率accuracy。要了解更多信息请阅读 [TensorFlow 教程](https://tensorflow.google.cn/tutorials/keras)。

View File

@@ -1,489 +0,0 @@
# 基本分类:对服装图像进行分类
> 原文:[https://tensorflow.google.cn/tutorials/keras/classification](https://tensorflow.google.cn/tutorials/keras/classification)
本指南将训练一个神经网络模型,对运动鞋和衬衫等服装图像进行分类。即使您不理解所有细节也没关系;这只是对完整 TensorFlow 程序的快速概述,详细内容会在您实际操作的同时进行介绍。
本指南使用了 [tf.keras](https://tensorflow.google.cn/guide/keras),它是 TensorFlow 中用来构建和训练模型的高级 API。
```py
# TensorFlow and tf.keras
import tensorflow as tf
from tensorflow import keras
# Helper libraries
import numpy as np
import matplotlib.pyplot as plt
print(tf.__version__)
```
```py
2.3.0
```
## 导入 Fashion MNIST 数据集
本指南使用 [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist) 数据集,该数据集包含 10 个类别的 70,000 个灰度图像。这些图像以低分辨率28x28 像素)展示了单件衣物,如下所示:
| ![Fashion MNIST sprite](img/8a26efaab988f8c9054ea977baabb45a.png) |
| **图 1.** [Fashion-MNIST 样本](https://github.com/zalandoresearch/fashion-mnist)(由 Zalando 提供MIT 许可)。
|
Fashion MNIST 旨在临时替代经典 [MNIST](http://yann.lecun.com/exdb/mnist/) 数据集后者常被用作计算机视觉机器学习程序的“Hello, World”。MNIST 数据集包含手写数字0、1、2 等)的图像,其格式与您将使用的衣物图像的格式相同。
本指南使用 Fashion MNIST 来实现多样化,因为它比常规 MNIST 更具挑战性。这两个数据集都相对较小,都用于验证某个算法是否按预期工作。对于代码的测试和调试,它们都是很好的起点。
在本指南中,我们使用 60,000 个图像来训练网络,使用 10,000 个图像来评估网络学习对图像分类的准确率。您可以直接从 TensorFlow 访问 Fashion MNIST。请运行以下代码直接从 TensorFlow 中导入和加载 Fashion MNIST 数据:
```py
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
```
加载数据集会返回四个 NumPy 数组:
* `train_images``train_labels` 数组是*训练集*,即模型用于学习的数据。
* *测试集*`test_images``test_labels` 数组会被用来对模型进行测试。
图像是 28x28 的 NumPy 数组,像素值介于 0 到 255 之间。*标签*是整数数组,介于 0 到 9 之间。这些标签对应于图像所代表的服装*类*
| 标签 | 类 |
| 0 | T 恤/上衣 |
| 1 | 裤子 |
| 2 | 套头衫 |
| 3 | 连衣裙 |
| 4 | 外套 |
| 5 | 凉鞋 |
| 6 | 衬衫 |
| 7 | 运动鞋 |
| 8 | 包 |
| 9 | 短靴 |
每个图像都会被映射到一个标签。由于数据集不包括*类名称*,请将它们存储在下方,供稍后绘制图像时使用:
```py
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
```
## 浏览数据
在训练模型之前,我们先浏览一下数据集的格式。以下代码显示训练集中有 60,000 个图像,每个图像由 28 x 28 的像素表示:
```py
train_images.shape
```
```py
(60000, 28, 28)
```
同样,训练集中有 60,000 个标签:
```py
len(train_labels)
```
```py
60000
```
每个标签都是一个 0 到 9 之间的整数:
```py
train_labels
```
```py
array([9, 0, 0, ..., 3, 0, 5], dtype=uint8)
```
测试集中有 10,000 个图像。同样,每个图像都由 28x28 个像素表示:
```py
test_images.shape
```
```py
(10000, 28, 28)
```
测试集包含 10,000 个图像标签:
```py
len(test_labels)
```
```py
10000
```
## 预处理数据
在训练网络之前,必须对数据进行预处理。如果您检查训练集中的第一个图像,您会看到像素值处于 0 到 255 之间:
```py
plt.figure()
plt.imshow(train_images[0])
plt.colorbar()
plt.grid(False)
plt.show()
```
![png](img/07fde30d678eaceba2bf9695ee89c403.png)
将这些值缩小至 0 到 1 之间,然后将其馈送到神经网络模型。为此,请将这些值除以 255。请务必以相同的方式对*训练集*和*测试集*进行预处理:
```py
train_images = train_images / 255.0
test_images = test_images / 255.0
```
为了验证数据的格式是否正确,以及您是否已准备好构建和训练网络,让我们显示*训练集*中的前 25 个图像,并在每个图像下方显示类名称。
```py
plt.figure(figsize=(10,10))
for i in range(25):
plt.subplot(5,5,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(train_images[i], cmap=plt.cm.binary)
plt.xlabel(class_names[train_labels[i]])
plt.show()
```
![png](img/0fc5058e71e5828192048ef6a6b9a595.png)
## 构建模型
构建神经网络需要先配置模型的层,然后再编译模型。
### 设置层
神经网络的基本组成部分是*层*。层会从向其馈送的数据中提取表示形式。希望这些表示形式有助于解决手头上的问题。
大多数深度学习都包括将简单的层链接在一起。大多数层(如 [`tf.keras.layers.Dense`](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/Dense))都具有在训练期间才会学习的参数。
```py
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(10)
])
```
该网络的第一层 [`tf.keras.layers.Flatten`](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/Flatten) 将图像格式从二维数组28 x 28 像素转换成一维数组28 x 28 = 784 像素)。将该层视为图像中未堆叠的像素行并将其排列起来。该层没有要学习的参数,它只会重新格式化数据。
展平像素后,网络会包括两个 [`tf.keras.layers.Dense`](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/Dense) 层的序列。它们是密集连接或全连接神经层。第一个 `Dense` 层有 128 个节点(或神经元)。第二个(也是最后一个)层会返回一个长度为 10 的 logits 数组。每个节点都包含一个得分,用来表示当前图像属于 10 个类中的哪一类。
### 编译模型
在准备对模型进行训练之前,还需要再对其进行一些设置。以下内容是在模型的*编译*步骤中添加的:
* *损失函数* - 用于测量模型在训练期间的准确率。您会希望最小化此函数,以便将模型“引导”到正确的方向上。
* *优化器* - 决定模型如何根据其看到的数据和自身的损失函数进行更新。
* *指标* - 用于监控训练和测试步骤。以下示例使用了*准确率*,即被正确分类的图像的比率。
```py
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
```
## 训练模型
训练神经网络模型需要执行以下步骤:
1. 将训练数据馈送给模型。在本例中,训练数据位于 `train_images``train_labels` 数组中。
2. 模型学习将图像和标签关联起来。
3. 要求模型对测试集(在本例中为 `test_images` 数组)进行预测。
4. 验证预测是否与 `test_labels` 数组中的标签相匹配。
### 向模型馈送数据
要开始训练,请调用 `model.fit` 方法,这样命名是因为该方法会将模型与训练数据进行“拟合”:
```py
model.fit(train_images, train_labels, epochs=10)
```
```py
Epoch 1/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.4924 - accuracy: 0.8265
Epoch 2/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.3698 - accuracy: 0.8669
Epoch 3/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.3340 - accuracy: 0.8781
Epoch 4/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.3110 - accuracy: 0.8863
Epoch 5/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.2924 - accuracy: 0.8936
Epoch 6/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.2776 - accuracy: 0.8972
Epoch 7/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.2659 - accuracy: 0.9021
Epoch 8/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.2543 - accuracy: 0.9052
Epoch 9/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.2453 - accuracy: 0.9084
Epoch 10/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.2366 - accuracy: 0.9122
<tensorflow.python.keras.callbacks.History at 0x7fc85fa4f2e8>
```
在模型训练期间,会显示损失和准确率指标。此模型在训练数据上的准确率达到了 0.91(或 91%)左右。
### 评估准确率
接下来,比较模型在测试数据集上的表现:
```py
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print('\nTest accuracy:', test_acc)
```
```py
313/313 - 0s - loss: 0.3726 - accuracy: 0.8635
Test accuracy: 0.8634999990463257
```
结果表明,模型在测试数据集上的准确率略低于训练数据集。训练准确率和测试准确率之间的差距代表*过拟合*。过拟合是指机器学习模型在新的、以前未曾见过的输入上的表现不如在训练数据上的表现。过拟合的模型会“记住”训练数据集中的噪声和细节,从而对模型在新数据上的表现产生负面影响。有关更多信息,请参阅以下内容:
* [演示过拟合](https://tensorflow.google.cn/tutorials/keras/overfit_and_underfit#demonstrate_overfitting)
* [避免过拟合的策略](https://tensorflow.google.cn/tutorials/keras/overfit_and_underfit#strategies_to_prevent_overfitting)
### 进行预测
在模型经过训练后,您可以使用它对一些图像进行预测。模型具有线性输出,即 [logits](https://developers.google.cn/machine-learning/glossary#logits)。您可以附加一个 softmax 层,将 logits 转换成更容易理解的概率。
```py
probability_model = tf.keras.Sequential([model,
tf.keras.layers.Softmax()])
```
```py
predictions = probability_model.predict(test_images)
```
在上例中,模型预测了测试集中每个图像的标签。我们来看看第一个预测结果:
```py
predictions[0]
```
```py
array([6.9982241e-07, 5.5403369e-08, 1.8353174e-07, 1.4761626e-07,
2.4380807e-07, 1.9273469e-04, 1.8122660e-06, 6.5027133e-02,
1.7891599e-06, 9.3477517e-01], dtype=float32)
```
预测结果是一个包含 10 个数字的数组。它们代表模型对 10 种不同服装中每种服装的“置信度”。您可以看到哪个标签的置信度值最大:
```py
np.argmax(predictions[0])
```
```py
9
```
因此,该模型非常确信这个图像是短靴,或 `class_names[9]`。通过检查测试标签发现这个分类是正确的:
```py
test_labels[0]
```
```py
9
```
您可以将其绘制成图表,看看模型对于全部 10 个类的预测。
```py
def plot_image(i, predictions_array, true_label, img):
predictions_array, true_label, img = predictions_array, true_label[i], img[i]
plt.grid(False)
plt.xticks([])
plt.yticks([])
plt.imshow(img, cmap=plt.cm.binary)
predicted_label = np.argmax(predictions_array)
if predicted_label == true_label:
color = 'blue'
else:
color = 'red'
plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
100*np.max(predictions_array),
class_names[true_label]),
color=color)
def plot_value_array(i, predictions_array, true_label):
predictions_array, true_label = predictions_array, true_label[i]
plt.grid(False)
plt.xticks(range(10))
plt.yticks([])
thisplot = plt.bar(range(10), predictions_array, color="#777777")
plt.ylim([0, 1])
predicted_label = np.argmax(predictions_array)
thisplot[predicted_label].set_color('red')
thisplot[true_label].set_color('blue')
```
### 验证预测结果
在模型经过训练后,您可以使用它对一些图像进行预测。
我们来看看第 0 个图像、预测结果和预测数组。正确的预测标签为蓝色,错误的预测标签为红色。数字表示预测标签的百分比(总计为 100
```py
i = 0
plt.figure(figsize=(6,3))
plt.subplot(1,2,1)
plot_image(i, predictions[i], test_labels, test_images)
plt.subplot(1,2,2)
plot_value_array(i, predictions[i], test_labels)
plt.show()
```
![png](img/55d2924ed5a33ffad4b9f727cd335194.png)
```py
i = 12
plt.figure(figsize=(6,3))
plt.subplot(1,2,1)
plot_image(i, predictions[i], test_labels, test_images)
plt.subplot(1,2,2)
plot_value_array(i, predictions[i], test_labels)
plt.show()
```
![png](img/0c7474d216a51a2b258a81a689920596.png)
让我们用模型的预测绘制几张图像。请注意,即使置信度很高,模型也可能出错。
```py
# Plot the first X test images, their predicted labels, and the true labels.
# Color correct predictions in blue and incorrect predictions in red.
num_rows = 5
num_cols = 3
num_images = num_rows*num_cols
plt.figure(figsize=(2*2*num_cols, 2*num_rows))
for i in range(num_images):
plt.subplot(num_rows, 2*num_cols, 2*i+1)
plot_image(i, predictions[i], test_labels, test_images)
plt.subplot(num_rows, 2*num_cols, 2*i+2)
plot_value_array(i, predictions[i], test_labels)
plt.tight_layout()
plt.show()
```
![png](img/8f40b70083328d6f68f1d2c5821927d1.png)
## 使用训练好的模型
最后,使用训练好的模型对单个图像进行预测。
```py
# Grab an image from the test dataset.
img = test_images[1]
print(img.shape)
```
```py
(28, 28)
```
[`tf.keras`](https://tensorflow.google.cn/api_docs/python/tf/keras) 模型经过了优化,可同时对一个*批*或一组样本进行预测。因此,即便您只使用一个图像,您也需要将其添加到列表中:
```py
# Add the image to a batch where it's the only member.
img = (np.expand_dims(img,0))
print(img.shape)
```
```py
(1, 28, 28)
```
现在预测这个图像的正确标签:
```py
predictions_single = probability_model.predict(img)
print(predictions_single)
```
```py
[[1.0675135e-05 2.4023437e-12 9.9772269e-01 1.3299730e-09 1.2968916e-03
8.7469149e-14 9.6970733e-04 5.4669354e-19 2.4514609e-11 1.8405429e-12]]
```
```py
plot_value_array(1, predictions_single[0], test_labels)
_ = plt.xticks(range(10), class_names, rotation=45)
```
![png](img/35aea8e2802acf908920febe4776fbf0.png)
[`keras.Model.predict`](https://tensorflow.google.cn/api_docs/python/tf/keras/Model#predict) 会返回一组列表,每个列表对应一批数据中的每个图像。在批次中获取对我们(唯一)图像的预测:
```py
np.argmax(predictions_single[0])
```
```py
2
```
该模型会按照预期预测标签。
```py
# MIT License
#
# Copyright (c) 2017 François Chollet
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
```

View File

@@ -1,467 +0,0 @@
# 电影评论文本分类
> 原文:[https://tensorflow.google.cn/tutorials/keras/text_classification](https://tensorflow.google.cn/tutorials/keras/text_classification)
**Note:** 我们的 TensorFlow 社区翻译了这些文档。因为社区翻译是尽力而为, 所以无法保证它们是最准确的,并且反映了最新的 [官方英文文档](https://tensorflow.google.cn/?hl=en)。如果您有改进此翻译的建议, 请提交 pull request 到 [tensorflow/docs](https://github.com/tensorflow/docs) GitHub 仓库。要志愿地撰写或者审核译文,请加入 [docs-zh-cn@tensorflow.org Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs-zh-cn)。
此笔记本notebook使用评论文本将影评分为*积极positive*或*消极nagetive*两类。这是一个*二元binary*或者二分类问题,一种重要且应用广泛的机器学习问题。
我们将使用来源于[网络电影数据库Internet Movie Database](https://www.imdb.com/)的 [IMDB 数据集IMDB dataset](https://tensorflow.google.cn/api_docs/python/tf/keras/datasets/imdb),其包含 50,000 条影评文本。从该数据集切割出的 25,000 条评论用作训练,另外 25,000 条用作测试。训练集与测试集是*平衡的balanced*,意味着它们包含相等数量的积极和消极评论。
此笔记本notebook使用了 [tf.keras](https://tensorflow.google.cn/guide/keras),它是一个 Tensorflow 中用于构建和训练模型的高级 API。有关使用 [`tf.keras`](https://tensorflow.google.cn/api_docs/python/tf/keras) 进行文本分类的更高级教程,请参阅 [MLCC 文本分类指南MLCC Text Classification Guide](https://developers.google.cn/machine-learning/guides/text-classification/)。
```py
import tensorflow as tf
from tensorflow import keras
import numpy as np
print(tf.__version__)
```
```py
2.3.0
```
## 下载 IMDB 数据集
IMDB 数据集已经打包在 Tensorflow 中。该数据集已经经过预处理,评论(单词序列)已经被转换为整数序列,其中每个整数表示字典中的特定单词。
以下代码将下载 IMDB 数据集到您的机器上(如果您已经下载过将从缓存中复制):
```py
imdb = keras.datasets.imdb
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
```
```py
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
17465344/17464789 [==============================] - 0s 0us/step
```
参数 `num_words=10000` 保留了训练数据中最常出现的 10,000 个单词。为了保持数据规模的可管理性,低频词将被丢弃。
## 探索数据
让我们花一点时间来了解数据格式。该数据集是经过预处理的:每个样本都是一个表示影评中词汇的整数数组。每个标签都是一个值为 0 或 1 的整数值,其中 0 代表消极评论1 代表积极评论。
```py
print("Training entries: {}, labels: {}".format(len(train_data), len(train_labels)))
```
```py
Training entries: 25000, labels: 25000
```
评论文本被转换为整数值,其中每个整数代表词典中的一个单词。首条评论是这样的:
```py
print(train_data[0])
```
```py
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]
```
电影评论可能具有不同的长度。以下代码显示了第一条和第二条评论的中单词数量。由于神经网络的输入必须是统一的长度,我们稍后需要解决这个问题。
```py
len(train_data[0]), len(train_data[1])
```
```py
(218, 189)
```
### 将整数转换回单词
了解如何将整数转换回文本对您可能是有帮助的。这里我们将创建一个辅助函数来查询一个包含了整数到字符串映射的字典对象:
```py
# 一个映射单词到整数索引的词典
word_index = imdb.get_word_index()
# 保留第一个索引
word_index = {k:(v+3) for k,v in word_index.items()}
word_index["<PAD>"] = 0
word_index["<START>"] = 1
word_index["<UNK>"] = 2 # unknown
word_index["<UNUSED>"] = 3
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
def decode_review(text):
return ' '.join([reverse_word_index.get(i, '?') for i in text])
```
```py
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json
1646592/1641221 [==============================] - 0s 0us/step
```
现在我们可以使用 `decode_review` 函数来显示首条评论的文本:
```py
decode_review(train_data[0])
```
```py
"<START> this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert <UNK> is an amazing actor and now the same being director <UNK> father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for <UNK> and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also <UNK> to the two little boy's that played the <UNK> of norman and paul they were just brilliant children are often left out of the <UNK> list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all"
```
## 准备数据
影评——即整数数组必须在输入神经网络之前转换为张量。这种转换可以通过以下两种方式来完成:
* 将数组转换为表示单词出现与否的由 0 和 1 组成的向量,类似于 one-hot 编码。例如,序列[3, 5]将转换为一个 10,000 维的向量,该向量除了索引为 3 和 5 的位置是 1 以外,其他都为 0。然后将其作为网络的首层——一个可以处理浮点型向量数据的稠密层。不过这种方法需要大量的内存需要一个大小为 `num_words * num_reviews` 的矩阵。
* 或者,我们可以填充数组来保证输入数据具有相同的长度,然后创建一个大小为 `max_length * num_reviews` 的整型张量。我们可以使用能够处理此形状数据的嵌入层作为网络中的第一层。
在本教程中,我们将使用第二种方法。
由于电影评论长度必须相同,我们将使用 [pad_sequences](https://tensorflow.google.cn/api_docs/python/tf/keras/preprocessing/sequence/pad_sequences) 函数来使长度标准化:
```py
train_data = keras.preprocessing.sequence.pad_sequences(train_data,
value=word_index["<PAD>"],
padding='post',
maxlen=256)
test_data = keras.preprocessing.sequence.pad_sequences(test_data,
value=word_index["<PAD>"],
padding='post',
maxlen=256)
```
现在让我们看下样本的长度:
```py
len(train_data[0]), len(train_data[1])
```
```py
(256, 256)
```
并检查一下首条评论(当前已经填充):
```py
print(train_data[0])
```
```py
[ 1 14 22 16 43 530 973 1622 1385 65 458 4468 66 3941
4 173 36 256 5 25 100 43 838 112 50 670 2 9
35 480 284 5 150 4 172 112 167 2 336 385 39 4
172 4536 1111 17 546 38 13 447 4 192 50 16 6 147
2025 19 14 22 4 1920 4613 469 4 22 71 87 12 16
43 530 38 76 15 13 1247 4 22 17 515 17 12 16
626 18 2 5 62 386 12 8 316 8 106 5 4 2223
5244 16 480 66 3785 33 4 130 12 16 38 619 5 25
124 51 36 135 48 25 1415 33 6 22 12 215 28 77
52 5 14 407 16 82 2 8 4 107 117 5952 15 256
4 2 7 3766 5 723 36 71 43 530 476 26 400 317
46 7 4 2 1029 13 104 88 4 381 15 297 98 32
2071 56 26 141 6 194 7486 18 4 226 22 21 134 476
26 480 5 144 30 5535 18 51 36 28 224 92 25 104
4 226 65 16 38 1334 88 12 16 283 5 16 4472 113
103 32 15 16 5345 19 178 32 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0]
```
## 构建模型
神经网络由堆叠的层来构建,这需要从两个主要方面来进行体系结构决策:
* 模型里有多少层?
* 每个层里有多少*隐层单元hidden units*
在此样本中,输入数据包含一个单词索引的数组。要预测的标签为 0 或 1。让我们来为该问题构建一个模型
```py
# 输入形状是用于电影评论的词汇数目10,000 词)
vocab_size = 10000
model = keras.Sequential()
model.add(keras.layers.Embedding(vocab_size, 16))
model.add(keras.layers.GlobalAveragePooling1D())
model.add(keras.layers.Dense(16, activation='relu'))
model.add(keras.layers.Dense(1, activation='sigmoid'))
model.summary()
```
```py
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (None, None, 16) 160000
_________________________________________________________________
global_average_pooling1d (Gl (None, 16) 0
_________________________________________________________________
dense (Dense) (None, 16) 272
_________________________________________________________________
dense_1 (Dense) (None, 1) 17
=================================================================
Total params: 160,289
Trainable params: 160,289
Non-trainable params: 0
_________________________________________________________________
```
层按顺序堆叠以构建分类器:
1. 第一层是`嵌入Embedding`层。该层采用整数编码的词汇表并查找每个词索引的嵌入向量embedding vector。这些向量是通过模型训练学习到的。向量向输出数组增加了一个维度。得到的维度为`(batch, sequence, embedding)`
2. 接下来,`GlobalAveragePooling1D` 将通过对序列维度求平均值来为每个样本返回一个定长输出向量。这允许模型以尽可能最简单的方式处理变长输入。
3. 该定长输出向量通过一个有 16 个隐层单元的全连接(`Dense`)层传输。
4. 最后一层与单个输出结点密集连接。使用 `Sigmoid` 激活函数,其函数值为介于 0 与 1 之间的浮点数,表示概率或置信度。
### 隐层单元
上述模型在输入输出之间有两个中间层或“隐藏层”。输出(单元,结点或神经元)的数量即为层表示空间的维度。换句话说,是学习内部表示时网络所允许的自由度。
如果模型具有更多的隐层单元(更高维度的表示空间)和/或更多层,则可以学习到更复杂的表示。但是,这会使网络的计算成本更高,并且可能导致学习到不需要的模式——一些能够在训练数据上而不是测试数据上改善性能的模式。这被称为*过拟合overfitting*,我们稍后会对此进行探究。
### 损失函数与优化器
一个模型需要损失函数和优化器来进行训练。由于这是一个二分类问题且模型输出概率值(一个使用 sigmoid 激活函数的单一单元层),我们将使用 `binary_crossentropy` 损失函数。
这不是损失函数的唯一选择,例如,您可以选择 `mean_squared_error` 。但是,一般来说 `binary_crossentropy` 更适合处理概率——它能够度量概率分布之间的“距离”,或者在我们的示例中,指的是度量 ground-truth 分布与预测值之间的“距离”。
稍后,当我们研究回归问题(例如,预测房价)时,我们将介绍如何使用另一种叫做均方误差的损失函数。
现在,配置模型来使用优化器和损失函数:
```py
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
```
## 创建一个验证集
在训练时我们想要检查模型在未见过的数据上的准确率accuracy。通过从原始训练数据中分离 10,000 个样本来创建一个*验证集*。为什么现在不使用测试集我们的目标是只使用训练数据来开发和调整模型然后只使用一次测试数据来评估准确率accuracy
```py
x_val = train_data[:10000]
partial_x_train = train_data[10000:]
y_val = train_labels[:10000]
partial_y_train = train_labels[10000:]
```
## 训练模型
以 512 个样本的 mini-batch 大小迭代 40 个 epoch 来训练模型。这是指对 `x_train``y_train` 张量中所有样本的的 40 次迭代。在训练过程中,监测来自验证集的 10,000 个样本上的损失值loss和准确率accuracy
```py
history = model.fit(partial_x_train,
partial_y_train,
epochs=40,
batch_size=512,
validation_data=(x_val, y_val),
verbose=1)
```
```py
Epoch 1/40
30/30 [==============================] - 1s 18ms/step - loss: 0.6924 - accuracy: 0.5173 - val_loss: 0.6911 - val_accuracy: 0.5699
Epoch 2/40
30/30 [==============================] - 0s 10ms/step - loss: 0.6886 - accuracy: 0.5734 - val_loss: 0.6863 - val_accuracy: 0.6309
Epoch 3/40
30/30 [==============================] - 0s 10ms/step - loss: 0.6810 - accuracy: 0.6439 - val_loss: 0.6766 - val_accuracy: 0.7367
Epoch 4/40
30/30 [==============================] - 0s 10ms/step - loss: 0.6667 - accuracy: 0.7411 - val_loss: 0.6595 - val_accuracy: 0.7328
Epoch 5/40
30/30 [==============================] - 0s 10ms/step - loss: 0.6431 - accuracy: 0.7602 - val_loss: 0.6327 - val_accuracy: 0.7677
Epoch 6/40
30/30 [==============================] - 0s 10ms/step - loss: 0.6086 - accuracy: 0.7896 - val_loss: 0.5968 - val_accuracy: 0.7894
Epoch 7/40
30/30 [==============================] - 0s 10ms/step - loss: 0.5654 - accuracy: 0.8147 - val_loss: 0.5550 - val_accuracy: 0.8102
Epoch 8/40
30/30 [==============================] - 0s 10ms/step - loss: 0.5180 - accuracy: 0.8337 - val_loss: 0.5115 - val_accuracy: 0.8230
Epoch 9/40
30/30 [==============================] - 0s 10ms/step - loss: 0.4709 - accuracy: 0.8535 - val_loss: 0.4705 - val_accuracy: 0.8356
Epoch 10/40
30/30 [==============================] - 0s 10ms/step - loss: 0.4269 - accuracy: 0.8655 - val_loss: 0.4342 - val_accuracy: 0.8454
Epoch 11/40
30/30 [==============================] - 0s 10ms/step - loss: 0.3887 - accuracy: 0.8763 - val_loss: 0.4040 - val_accuracy: 0.8545
Epoch 12/40
30/30 [==============================] - 0s 10ms/step - loss: 0.3566 - accuracy: 0.8843 - val_loss: 0.3799 - val_accuracy: 0.8598
Epoch 13/40
30/30 [==============================] - 0s 10ms/step - loss: 0.3299 - accuracy: 0.8911 - val_loss: 0.3608 - val_accuracy: 0.8660
Epoch 14/40
30/30 [==============================] - 0s 10ms/step - loss: 0.3070 - accuracy: 0.8975 - val_loss: 0.3458 - val_accuracy: 0.8702
Epoch 15/40
30/30 [==============================] - 0s 10ms/step - loss: 0.2876 - accuracy: 0.9021 - val_loss: 0.3334 - val_accuracy: 0.8727
Epoch 16/40
30/30 [==============================] - 0s 10ms/step - loss: 0.2708 - accuracy: 0.9073 - val_loss: 0.3234 - val_accuracy: 0.8753
Epoch 17/40
30/30 [==============================] - 0s 10ms/step - loss: 0.2558 - accuracy: 0.9130 - val_loss: 0.3154 - val_accuracy: 0.8773
Epoch 18/40
30/30 [==============================] - 0s 10ms/step - loss: 0.2428 - accuracy: 0.9175 - val_loss: 0.3102 - val_accuracy: 0.8782
Epoch 19/40
30/30 [==============================] - 0s 10ms/step - loss: 0.2308 - accuracy: 0.9214 - val_loss: 0.3032 - val_accuracy: 0.8812
Epoch 20/40
30/30 [==============================] - 0s 10ms/step - loss: 0.2194 - accuracy: 0.9246 - val_loss: 0.2988 - val_accuracy: 0.8818
Epoch 21/40
30/30 [==============================] - 0s 10ms/step - loss: 0.2093 - accuracy: 0.9280 - val_loss: 0.2956 - val_accuracy: 0.8821
Epoch 22/40
30/30 [==============================] - 0s 10ms/step - loss: 0.2000 - accuracy: 0.9321 - val_loss: 0.2921 - val_accuracy: 0.8838
Epoch 23/40
30/30 [==============================] - 0s 10ms/step - loss: 0.1912 - accuracy: 0.9357 - val_loss: 0.2901 - val_accuracy: 0.8846
Epoch 24/40
30/30 [==============================] - 0s 10ms/step - loss: 0.1829 - accuracy: 0.9396 - val_loss: 0.2885 - val_accuracy: 0.8847
Epoch 25/40
30/30 [==============================] - 0s 10ms/step - loss: 0.1756 - accuracy: 0.9439 - val_loss: 0.2874 - val_accuracy: 0.8844
Epoch 26/40
30/30 [==============================] - 0s 10ms/step - loss: 0.1681 - accuracy: 0.9465 - val_loss: 0.2864 - val_accuracy: 0.8855
Epoch 27/40
30/30 [==============================] - 0s 10ms/step - loss: 0.1617 - accuracy: 0.9481 - val_loss: 0.2867 - val_accuracy: 0.8844
Epoch 28/40
30/30 [==============================] - 0s 10ms/step - loss: 0.1548 - accuracy: 0.9519 - val_loss: 0.2865 - val_accuracy: 0.8861
Epoch 29/40
30/30 [==============================] - 0s 10ms/step - loss: 0.1485 - accuracy: 0.9543 - val_loss: 0.2872 - val_accuracy: 0.8849
Epoch 30/40
30/30 [==============================] - 0s 10ms/step - loss: 0.1426 - accuracy: 0.9561 - val_loss: 0.2881 - val_accuracy: 0.8854
Epoch 31/40
30/30 [==============================] - 0s 10ms/step - loss: 0.1372 - accuracy: 0.9587 - val_loss: 0.2895 - val_accuracy: 0.8851
Epoch 32/40
30/30 [==============================] - 0s 10ms/step - loss: 0.1320 - accuracy: 0.9609 - val_loss: 0.2899 - val_accuracy: 0.8856
Epoch 33/40
30/30 [==============================] - 0s 10ms/step - loss: 0.1267 - accuracy: 0.9625 - val_loss: 0.2911 - val_accuracy: 0.8851
Epoch 34/40
30/30 [==============================] - 0s 10ms/step - loss: 0.1219 - accuracy: 0.9649 - val_loss: 0.2931 - val_accuracy: 0.8851
Epoch 35/40
30/30 [==============================] - 0s 10ms/step - loss: 0.1173 - accuracy: 0.9666 - val_loss: 0.2948 - val_accuracy: 0.8863
Epoch 36/40
30/30 [==============================] - 0s 10ms/step - loss: 0.1127 - accuracy: 0.9685 - val_loss: 0.2985 - val_accuracy: 0.8851
Epoch 37/40
30/30 [==============================] - 0s 10ms/step - loss: 0.1086 - accuracy: 0.9688 - val_loss: 0.2998 - val_accuracy: 0.8860
Epoch 38/40
30/30 [==============================] - 0s 10ms/step - loss: 0.1045 - accuracy: 0.9716 - val_loss: 0.3033 - val_accuracy: 0.8839
Epoch 39/40
30/30 [==============================] - 0s 10ms/step - loss: 0.1007 - accuracy: 0.9723 - val_loss: 0.3049 - val_accuracy: 0.8847
Epoch 40/40
30/30 [==============================] - 0s 10ms/step - loss: 0.0967 - accuracy: 0.9737 - val_loss: 0.3087 - val_accuracy: 0.8832
```
## 评估模型
我们来看一下模型的性能如何。将返回两个值。损失值loss一个表示误差的数字值越低越好与准确率accuracy
```py
results = model.evaluate(test_data, test_labels, verbose=2)
print(results)
```
```py
782/782 - 1s - loss: 0.3298 - accuracy: 0.8729
[0.32977813482284546, 0.8728799819946289]
```
这种十分朴素的方法得到了约 87% 的准确率accuracy。若采用更好的方法模型的准确率应当接近 95%。
## 创建一个准确率accuracy和损失值loss随时间变化的图表
`model.fit()` 返回一个 `History` 对象,该对象包含一个字典,其中包含训练阶段所发生的一切事件:
```py
history_dict = history.history
history_dict.keys()
```
```py
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])
```
有四个条目在训练和验证期间每个条目对应一个监控指标。我们可以使用这些条目来绘制训练与验证过程的损失值loss和准确率accuracy以便进行比较。
```py
import matplotlib.pyplot as plt
acc = history_dict['accuracy']
val_acc = history_dict['val_accuracy']
loss = history_dict['loss']
val_loss = history_dict['val_loss']
epochs = range(1, len(acc) + 1)
# “bo”代表 "蓝点"
plt.plot(epochs, loss, 'bo', label='Training loss')
# b 代表“蓝色实线”
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
```
![png](img/9c459926609b3f3452425d5e76209223.png)
```py
plt.clf() # 清除数字
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
```
![png](img/6cd4981eb3c80dc3045b45bd7fd0e7ea.png)
在该图中点代表训练损失值loss与准确率accuracy实线代表验证损失值loss与准确率accuracy
注意训练损失值随每一个 epoch *下降*而训练准确率accuracy随每一个 epoch *上升*。这在使用梯度下降优化时是可预期的——理应在每次迭代中最小化期望值。
验证过程的损失值loss与准确率accuracy的情况却并非如此——它们似乎在 20 个 epoch 后达到峰值。这是过拟合的一个实例:模型在训练数据上的表现比在以前从未见过的数据上的表现要更好。在此之后,模型过度优化并学习*特定*于训练数据的表示,而不能够*泛化*到测试数据。
对于这种特殊情况,我们可以通过在 20 个左右的 epoch 后停止训练来避免过拟合。稍后,您将看到如何通过回调自动执行此操作。
```py
# MIT License
#
# Copyright (c) 2017 François Chollet
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
```

View File

@@ -1,308 +0,0 @@
# 使用 Keras 和 Tensorflow Hub 对电影评论进行文本分类
> 原文:[https://tensorflow.google.cn/tutorials/keras/text_classification_with_hub](https://tensorflow.google.cn/tutorials/keras/text_classification_with_hub)
**Note:** 我们的 TensorFlow 社区翻译了这些文档。因为社区翻译是尽力而为, 所以无法保证它们是最准确的,并且反映了最新的 [官方英文文档](https://tensorflow.google.cn/?hl=en)。如果您有改进此翻译的建议, 请提交 pull request 到 [tensorflow/docs](https://github.com/tensorflow/docs) GitHub 仓库。要志愿地撰写或者审核译文,请加入 [docs-zh-cn@tensorflow.org Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs-zh-cn)。
此笔记本notebook使用评论文本将影评分为*积极positive*或*消极nagetive*两类。这是一个*二元binary*或者二分类问题,一种重要且应用广泛的机器学习问题。
本教程演示了使用 Tensorflow Hub 和 Keras 进行迁移学习的基本应用。
我们将使用来源于[网络电影数据库Internet Movie Database](https://www.imdb.com/)的 [IMDB 数据集IMDB dataset](https://tensorflow.google.cn/api_docs/python/tf/keras/datasets/imdb),其包含 50,000 条影评文本。从该数据集切割出的 25,000 条评论用作训练,另外 25,000 条用作测试。训练集与测试集是*平衡的balanced*,意味着它们包含相等数量的积极和消极评论。
此笔记本notebook使用了 [tf.keras](https://tensorflow.google.cn/guide/keras),它是一个 Tensorflow 中用于构建和训练模型的高级 API此外还使用了 [TensorFlow Hub](https://tensorflow.google.cn/hub),一个用于迁移学习的库和平台。有关使用 [`tf.keras`](https://tensorflow.google.cn/api_docs/python/tf/keras) 进行文本分类的更高级教程,请参阅 [MLCC 文本分类指南MLCC Text Classification Guide](https://developers.google.cn/machine-learning/guides/text-classification/)。
```py
import numpy as np
import tensorflow as tf
!pip install -q tensorflow-hub
!pip install -q tfds-nightly
import tensorflow_hub as hub
import tensorflow_datasets as tfds
print("Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("Hub version: ", hub.__version__)
print("GPU is", "available" if tf.config.experimental.list_physical_devices("GPU") else "NOT AVAILABLE")
```
```py
WARNING: You are using pip version 20.2.2; however, version 20.2.3 is available.
You should consider upgrading via the '/tmpfs/src/tf_docs_env/bin/python -m pip install --upgrade pip' command.
WARNING: You are using pip version 20.2.2; however, version 20.2.3 is available.
You should consider upgrading via the '/tmpfs/src/tf_docs_env/bin/python -m pip install --upgrade pip' command.
Version: 2.3.0
Eager mode: True
Hub version: 0.9.0
GPU is available
```
## 下载 IMDB 数据集
IMDB 数据集可以在 [Tensorflow 数据集](https://github.com/tensorflow/datasets)处获取。以下代码将 IMDB 数据集下载至您的机器(或 colab 运行时环境)中:
```py
# 将训练集分割成 60% 和 40%,从而最终我们将得到 15,000 个训练样本
# 10,000 个验证样本以及 25,000 个测试样本。
train_data, validation_data, test_data = tfds.load(
name="imdb_reviews",
split=('train[:60%]', 'train[60%:]', 'test'),
as_supervised=True)
```
```py
Downloading and preparing dataset imdb_reviews/plain_text/1.0.0 (download: 80.23 MiB, generated: Unknown size, total: 80.23 MiB) to /home/kbuilder/tensorflow_datasets/imdb_reviews/plain_text/1.0.0...
Shuffling and writing examples to /home/kbuilder/tensorflow_datasets/imdb_reviews/plain_text/1.0.0.incompleteZDZ3AR/imdb_reviews-train.tfrecord
Shuffling and writing examples to /home/kbuilder/tensorflow_datasets/imdb_reviews/plain_text/1.0.0.incompleteZDZ3AR/imdb_reviews-test.tfrecord
Shuffling and writing examples to /home/kbuilder/tensorflow_datasets/imdb_reviews/plain_text/1.0.0.incompleteZDZ3AR/imdb_reviews-unsupervised.tfrecord
Dataset imdb_reviews downloaded and prepared to /home/kbuilder/tensorflow_datasets/imdb_reviews/plain_text/1.0.0\. Subsequent calls will reuse this data.
```
## 探索数据
让我们花一点时间来了解数据的格式。每一个样本都是一个表示电影评论和相应标签的句子。该句子不以任何方式进行预处理。标签是一个值为 0 或 1 的整数,其中 0 代表消极评论1 代表积极评论。
我们来打印下前十个样本。
```py
train_examples_batch, train_labels_batch = next(iter(train_data.batch(10)))
train_examples_batch
```
```py
<tf.Tensor: shape=(10,), dtype=string, numpy=
array([b"This was an absolutely terrible movie. Don't be lured in by Christopher Walken or Michael Ironside. Both are great actors, but this must simply be their worst role in history. Even their great acting could not redeem this movie's ridiculous storyline. This movie is an early nineties US propaganda piece. The most pathetic scenes were those when the Columbian rebels were making their cases for revolutions. Maria Conchita Alonso appeared phony, and her pseudo-love affair with Walken was nothing but a pathetic emotional plug in a movie that was devoid of any real meaning. I am disappointed that there are movies like this, ruining actor's like Christopher Walken's good name. I could barely sit through it.",
b'I have been known to fall asleep during films, but this is usually due to a combination of things including, really tired, being warm and comfortable on the sette and having just eaten a lot. However on this occasion I fell asleep because the film was rubbish. The plot development was constant. Constantly slow and boring. Things seemed to happen, but with no explanation of what was causing them or why. I admit, I may have missed part of the film, but i watched the majority of it and everything just seemed to happen of its own accord without any real concern for anything else. I cant recommend this film at all.',
b'Mann photographs the Alberta Rocky Mountains in a superb fashion, and Jimmy Stewart and Walter Brennan give enjoyable performances as they always seem to do. <br /><br />But come on Hollywood - a Mountie telling the people of Dawson City, Yukon to elect themselves a marshal (yes a marshal!) and to enforce the law themselves, then gunfighters battling it out on the streets for control of the town? <br /><br />Nothing even remotely resembling that happened on the Canadian side of the border during the Klondike gold rush. Mr. Mann and company appear to have mistaken Dawson City for Deadwood, the Canadian North for the American Wild West.<br /><br />Canadian viewers be prepared for a Reefer Madness type of enjoyable howl with this ludicrous plot, or, to shake your head in disgust.',
b'This is the kind of film for a snowy Sunday afternoon when the rest of the world can go ahead with its own business as you descend into a big arm-chair and mellow for a couple of hours. Wonderful performances from Cher and Nicolas Cage (as always) gently row the plot along. There are no rapids to cross, no dangerous waters, just a warm and witty paddle through New York life at its best. A family film in every sense and one that deserves the praise it received.',
b'As others have mentioned, all the women that go nude in this film are mostly absolutely gorgeous. The plot very ably shows the hypocrisy of the female libido. When men are around they want to be pursued, but when no "men" are around, they become the pursuers of a 14 year old boy. And the boy becomes a man really fast (we should all be so lucky at this age!). He then gets up the courage to pursue his true love.',
b"This is a film which should be seen by anybody interested in, effected by, or suffering from an eating disorder. It is an amazingly accurate and sensitive portrayal of bulimia in a teenage girl, its causes and its symptoms. The girl is played by one of the most brilliant young actresses working in cinema today, Alison Lohman, who was later so spectacular in 'Where the Truth Lies'. I would recommend that this film be shown in all schools, as you will never see a better on this subject. Alison Lohman is absolutely outstanding, and one marvels at her ability to convey the anguish of a girl suffering from this compulsive disorder. If barometers tell us the air pressure, Alison Lohman tells us the emotional pressure with the same degree of accuracy. Her emotional range is so precise, each scene could be measured microscopically for its gradations of trauma, on a scale of rising hysteria and desperation which reaches unbearable intensity. Mare Winningham is the perfect choice to play her mother, and does so with immense sympathy and a range of emotions just as finely tuned as Lohman's. Together, they make a pair of sensitive emotional oscillators vibrating in resonance with one another. This film is really an astonishing achievement, and director Katt Shea should be proud of it. The only reason for not seeing it is if you are not interested in people. But even if you like nature films best, this is after all animal behaviour at the sharp edge. Bulimia is an extreme version of how a tormented soul can destroy her own body in a frenzy of despair. And if we don't sympathise with people suffering from the depths of despair, then we are dead inside.",
b'Okay, you have:<br /><br />Penelope Keith as Miss Herringbone-Tweed, B.B.E. (Backbone of England.) She\'s killed off in the first scene - that\'s right, folks; this show has no backbone!<br /><br />Peter O\'Toole as Ol\' Colonel Cricket from The First War and now the emblazered Lord of the Manor.<br /><br />Joanna Lumley as the ensweatered Lady of the Manor, 20 years younger than the colonel and 20 years past her own prime but still glamourous (Brit spelling, not mine) enough to have a toy-boy on the side. It\'s alright, they have Col. Cricket\'s full knowledge and consent (they guy even comes \'round for Christmas!) Still, she\'s considerate of the colonel enough to have said toy-boy her own age (what a gal!)<br /><br />David McCallum as said toy-boy, equally as pointlessly glamourous as his squeeze. Pilcher couldn\'t come up with any cover for him within the story, so she gave him a hush-hush job at the Circus.<br /><br />and finally:<br /><br />Susan Hampshire as Miss Polonia Teacups, Venerable Headmistress of the Venerable Girls\' Boarding-School, serving tea in her office with a dash of deep, poignant advice for life in the outside world just before graduation. Her best bit of advice: "I\'ve only been to Nancherrow (the local Stately Home of England) once. I thought it was very beautiful but, somehow, not part of the real world." Well, we can\'t say they didn\'t warn us.<br /><br />Ah, Susan - time was, your character would have been running the whole show. They don\'t write \'em like that any more. Our loss, not yours.<br /><br />So - with a cast and setting like this, you have the re-makings of "Brideshead Revisited," right?<br /><br />Wrong! They took these 1-dimensional supporting roles because they paid so well. After all, acting is one of the oldest temp-jobs there is (YOU name another!)<br /><br />First warning sign: lots and lots of backlighting. They get around it by shooting outdoors - "hey, it\'s just the sunlight!"<br /><br />Second warning sign: Leading Lady cries a lot. When not crying, her eyes are moist. That\'s the law of romance novels: Leading Lady is "dewy-eyed."<br /><br />Henceforth, Leading Lady shall be known as L.L.<br /><br />Third warning sign: L.L. actually has stars in her eyes when she\'s in love. Still, I\'ll give Emily Mortimer an award just for having to act with that spotlight in her eyes (I wonder . did they use contacts?)<br /><br />And lastly, fourth warning sign: no on-screen female character is "Mrs." She\'s either "Miss" or "Lady."<br /><br />When all was said and done, I still couldn\'t tell you who was pursuing whom and why. I couldn\'t even tell you what was said and done.<br /><br />To sum up: they all live through World War II without anything happening to them at all.<br /><br />OK, at the end, L.L. finds she\'s lost her parents to the Japanese prison camps and baby sis comes home catatonic. Meanwhile (there\'s always a "meanwhile,") some young guy L.L. had a crush on (when, I don\'t know) comes home from some wartime tough spot and is found living on the street by Lady of the Manor (must be some street if SHE\'s going to find him there.) Both war casualties are whisked away to recover at Nancherrow (SOMEBODY has to be "whisked away" SOMEWHERE in these romance stories!)<br /><br />Great drama.',
b'The film is based on a genuine 1950s novel.<br /><br />Journalist Colin McInnes wrote a set of three "London novels": "Absolute Beginners", "City of Spades" and "Mr Love and Justice". I have read all three. The first two are excellent. The last, perhaps an experiment that did not come off. But McInnes\'s work is highly acclaimed; and rightly so. This musical is the novelist\'s ultimate nightmare - to see the fruits of one\'s mind being turned into a glitzy, badly-acted, soporific one-dimensional apology of a film that says it captures the spirit of 1950s London, and does nothing of the sort.<br /><br />Thank goodness Colin McInnes wasn\'t alive to witness it.',
b'I really love the sexy action and sci-fi films of the sixties and its because of the actress\'s that appeared in them. They found the sexiest women to be in these films and it didn\'t matter if they could act (Remember "Candy"?). The reason I was disappointed by this film was because it wasn\'t nostalgic enough. The story here has a European sci-fi film called "Dragonfly" being made and the director is fired. So the producers decide to let a young aspiring filmmaker (Jeremy Davies) to complete the picture. They\'re is one real beautiful woman in the film who plays Dragonfly but she\'s barely in it. Film is written and directed by Roman Coppola who uses some of his fathers exploits from his early days and puts it into the script. I wish the film could have been an homage to those early films. They could have lots of cameos by actors who appeared in them. There is one actor in this film who was popular from the sixties and its John Phillip Law (Barbarella). Gerard Depardieu, Giancarlo Giannini and Dean Stockwell appear as well. I guess I\'m going to have to continue waiting for a director to make a good homage to the films of the sixties. If any are reading this, "Make it as sexy as you can"! I\'ll be waiting!',
b'Sure, this one isn\'t really a blockbuster, nor does it target such a position. "Dieter" is the first name of a quite popular German musician, who is either loved or hated for his kind of acting and thats exactly what this movie is about. It is based on the autobiography "Dieter Bohlen" wrote a few years ago but isn\'t meant to be accurate on that. The movie is filled with some sexual offensive content (at least for American standard) which is either amusing (not for the other "actors" of course) or dumb - it depends on your individual kind of humor or on you being a "Bohlen"-Fan or not. Technically speaking there isn\'t much to criticize. Speaking of me I find this movie to be an OK-movie.'],
dtype=object)>
```
我们再打印下前十个标签。
```py
train_labels_batch
```
```py
<tf.Tensor: shape=(10,), dtype=int64, numpy=array([0, 0, 0, 1, 1, 1, 0, 0, 0, 0])>
```
## 构建模型
神经网络由堆叠的层来构建,这需要从三个主要方面来进行体系结构决策:
* 如何表示文本?
* 模型里有多少层?
* 每个层里有多少*隐层单元hidden units*
本示例中,输入数据由句子组成。预测的标签为 0 或 1。
表示文本的一种方式是将句子转换为嵌入向量embeddings vectors。我们可以使用一个预先训练好的文本嵌入text embedding作为首层这将具有三个优点
* 我们不必担心文本预处理
* 我们可以从迁移学习中受益
* 嵌入具有固定长度,更易于处理
针对此示例我们将使用 [TensorFlow Hub](https://tensorflow.google.cn/hub) 中名为 [google/tf2-preview/gnews-swivel-20dim/1](https://hub.tensorflow.google.cn/google/tf2-preview/gnews-swivel-20dim/1) 的一种**预训练文本嵌入text embedding模型** 。
为了达到本教程的目的还有其他三种预训练模型可供测试:
* [google/tf2-preview/gnews-swivel-20dim-with-oov/1](https://hub.tensorflow.google.cn/google/tf2-preview/gnews-swivel-20dim-with-oov/1) ——类似 [google/tf2-preview/gnews-swivel-20dim/1](https://hub.tensorflow.google.cn/google/tf2-preview/gnews-swivel-20dim/1),但 2.5%的词汇转换为未登录词桶OOV buckets。如果任务的词汇与模型的词汇没有完全重叠这将会有所帮助。
* [google/tf2-preview/nnlm-en-dim50/1](https://hub.tensorflow.google.cn/google/tf2-preview/nnlm-en-dim50/1) ——一个拥有约 1M 词汇量且维度为 50 的更大的模型。
* [google/tf2-preview/nnlm-en-dim128/1](https://hub.tensorflow.google.cn/google/tf2-preview/nnlm-en-dim128/1) ——拥有约 1M 词汇量且维度为 128 的更大的模型。
让我们首先创建一个使用 Tensorflow Hub 模型嵌入embed语句的 Keras 层并在几个输入样本中进行尝试。请注意无论输入文本的长度如何嵌入embeddings输出的形状都是`(num_examples, embedding_dimension)`
```py
embedding = "https://hub.tensorflow.google.cn/google/tf2-preview/gnews-swivel-20dim/1"
hub_layer = hub.KerasLayer(embedding, input_shape=[],
dtype=tf.string, trainable=True)
hub_layer(train_examples_batch[:3])
```
```py
<tf.Tensor: shape=(3, 20), dtype=float32, numpy=
array([[ 1.765786 , -3.882232 , 3.9134233 , -1.5557289 , -3.3362343 ,
-1.7357955 , -1.9954445 , 1.2989551 , 5.081598 , -1.1041286 ,
-2.0503852 , -0.72675157, -0.65675956, 0.24436149, -3.7208383 ,
2.0954835 , 2.2969332 , -2.0689783 , -2.9489717 , -1.1315987 ],
[ 1.8804485 , -2.5852382 , 3.4066997 , 1.0982676 , -4.056685 ,
-4.891284 , -2.785554 , 1.3874227 , 3.8476458 , -0.9256538 ,
-1.896706 , 1.2113281 , 0.11474707, 0.76209456, -4.8791065 ,
2.906149 , 4.7087674 , -2.3652055 , -3.5015898 , -1.6390051 ],
[ 0.71152234, -0.6353217 , 1.7385626 , -1.1168286 , -0.5451594 ,
-1.1808156 , 0.09504455, 1.4653089 , 0.66059524, 0.79308075,
-2.2268345 , 0.07446612, -1.4075904 , -0.70645386, -1.907037 ,
1.4419787 , 1.9551861 , -0.42660055, -2.8022065 , 0.43727064]],
dtype=float32)>
```
现在让我们构建完整模型:
```py
model = tf.keras.Sequential()
model.add(hub_layer)
model.add(tf.keras.layers.Dense(16, activation='relu'))
model.add(tf.keras.layers.Dense(1))
model.summary()
```
```py
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
keras_layer (KerasLayer) (None, 20) 400020
_________________________________________________________________
dense (Dense) (None, 16) 336
_________________________________________________________________
dense_1 (Dense) (None, 1) 17
=================================================================
Total params: 400,373
Trainable params: 400,373
Non-trainable params: 0
_________________________________________________________________
```
层按顺序堆叠以构建分类器:
1. 第一层是 Tensorflow Hub 层。这一层使用一个预训练的保存好的模型来将句子映射为嵌入向量embedding vector。我们所使用的预训练文本嵌入embedding模型([google/tf2-preview/gnews-swivel-20dim/1](https://hub.tensorflow.google.cn/google/tf2-preview/gnews-swivel-20dim/1))将句子切割为符号嵌入embed每个符号然后进行合并。最终得到的维度是`(num_examples, embedding_dimension)`
2. 该定长输出向量通过一个有 16 个隐层单元的全连接层(`Dense`)进行管道传输。
3. 最后一层与单个输出结点紧密相连。使用 `Sigmoid` 激活函数,其函数值为介于 0 与 1 之间的浮点数,表示概率或置信水平。
让我们编译模型。
### 损失函数与优化器
一个模型需要损失函数和优化器来进行训练。由于这是一个二分类问题且模型输出概率值(一个使用 sigmoid 激活函数的单一单元层),我们将使用 `binary_crossentropy` 损失函数。
这不是损失函数的唯一选择,例如,您可以选择 `mean_squared_error` 。但是,一般来说 `binary_crossentropy` 更适合处理概率——它能够度量概率分布之间的“距离”,或者在我们的示例中,指的是度量 ground-truth 分布与预测值之间的“距离”。
稍后,当我们研究回归问题(例如,预测房价)时,我们将介绍如何使用另一种叫做均方误差的损失函数。
现在,配置模型来使用优化器和损失函数:
```py
model.compile(optimizer='adam',
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
```
## 训练模型
以 512 个样本的 mini-batch 大小迭代 20 个 epoch 来训练模型。 这是指对 `x_train``y_train` 张量中所有样本的的 20 次迭代。在训练过程中,监测来自验证集的 10,000 个样本上的损失值loss和准确率accuracy
```py
history = model.fit(train_data.shuffle(10000).batch(512),
epochs=20,
validation_data=validation_data.batch(512),
verbose=1)
```
```py
Epoch 1/20
30/30 [==============================] - 2s 64ms/step - loss: 1.5444 - accuracy: 0.4965 - val_loss: 0.9259 - val_accuracy: 0.4705
Epoch 2/20
30/30 [==============================] - 2s 59ms/step - loss: 0.7667 - accuracy: 0.4990 - val_loss: 0.7017 - val_accuracy: 0.5327
Epoch 3/20
30/30 [==============================] - 2s 58ms/step - loss: 0.6631 - accuracy: 0.5799 - val_loss: 0.6387 - val_accuracy: 0.6238
Epoch 4/20
30/30 [==============================] - 2s 58ms/step - loss: 0.6156 - accuracy: 0.6327 - val_loss: 0.6051 - val_accuracy: 0.6390
Epoch 5/20
30/30 [==============================] - 2s 57ms/step - loss: 0.5819 - accuracy: 0.6623 - val_loss: 0.5761 - val_accuracy: 0.6639
Epoch 6/20
30/30 [==============================] - 2s 57ms/step - loss: 0.5492 - accuracy: 0.6983 - val_loss: 0.5475 - val_accuracy: 0.6873
Epoch 7/20
30/30 [==============================] - 2s 58ms/step - loss: 0.5159 - accuracy: 0.7294 - val_loss: 0.5176 - val_accuracy: 0.7277
Epoch 8/20
30/30 [==============================] - 2s 58ms/step - loss: 0.4813 - accuracy: 0.7609 - val_loss: 0.4884 - val_accuracy: 0.7490
Epoch 9/20
30/30 [==============================] - 2s 58ms/step - loss: 0.4472 - accuracy: 0.7869 - val_loss: 0.4602 - val_accuracy: 0.7747
Epoch 10/20
30/30 [==============================] - 2s 58ms/step - loss: 0.4141 - accuracy: 0.8113 - val_loss: 0.4352 - val_accuracy: 0.7983
Epoch 11/20
30/30 [==============================] - 2s 57ms/step - loss: 0.3837 - accuracy: 0.8312 - val_loss: 0.4113 - val_accuracy: 0.8074
Epoch 12/20
30/30 [==============================] - 2s 58ms/step - loss: 0.3558 - accuracy: 0.8482 - val_loss: 0.3910 - val_accuracy: 0.8152
Epoch 13/20
30/30 [==============================] - 2s 57ms/step - loss: 0.3305 - accuracy: 0.8611 - val_loss: 0.3727 - val_accuracy: 0.8270
Epoch 14/20
30/30 [==============================] - 2s 58ms/step - loss: 0.3071 - accuracy: 0.8746 - val_loss: 0.3602 - val_accuracy: 0.8455
Epoch 15/20
30/30 [==============================] - 2s 58ms/step - loss: 0.2872 - accuracy: 0.8840 - val_loss: 0.3445 - val_accuracy: 0.8462
Epoch 16/20
30/30 [==============================] - 2s 58ms/step - loss: 0.2678 - accuracy: 0.8942 - val_loss: 0.3333 - val_accuracy: 0.8538
Epoch 17/20
30/30 [==============================] - 2s 58ms/step - loss: 0.2505 - accuracy: 0.9010 - val_loss: 0.3243 - val_accuracy: 0.8557
Epoch 18/20
30/30 [==============================] - 2s 57ms/step - loss: 0.2351 - accuracy: 0.9073 - val_loss: 0.3172 - val_accuracy: 0.8634
Epoch 19/20
30/30 [==============================] - 2s 58ms/step - loss: 0.2209 - accuracy: 0.9154 - val_loss: 0.3108 - val_accuracy: 0.8660
Epoch 20/20
30/30 [==============================] - 2s 57ms/step - loss: 0.2082 - accuracy: 0.9224 - val_loss: 0.3058 - val_accuracy: 0.8676
```
## 评估模型
我们来看下模型的表现如何。将返回两个值。损失值loss一个表示误差的数字值越低越好与准确率accuracy
```py
results = model.evaluate(test_data.batch(512), verbose=2)
for name, value in zip(model.metrics_names, results):
print("%s: %.3f" % (name, value))
```
```py
49/49 - 1s - loss: 0.3208 - accuracy: 0.8546
loss: 0.321
accuracy: 0.855
```
这种十分朴素的方法得到了约 87% 的准确率accuracy。若采用更好的方法模型的准确率应当接近 95%。
## 进一步阅读
有关使用字符串输入的更一般方法以及对训练期间准确率accuracy和损失值loss更详细的分析请参阅[此处](https://tensorflow.google.cn/tutorials/keras/basic_text_classification)。
```py
# MIT License
#
# Copyright (c) 2017 François Chollet
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
```

View File

@@ -1,444 +0,0 @@
# Basic regression: Predict fuel efficiency
> 原文:[https://tensorflow.google.cn/tutorials/keras/regression](https://tensorflow.google.cn/tutorials/keras/regression)
<devsite-mathjax config="TeX-AMS-MML_SVG"></devsite-mathjax>
**Note:** 我们的 TensorFlow 社区翻译了这些文档。因为社区翻译是尽力而为, 所以无法保证它们是最准确的,并且反映了最新的 [官方英文文档](https://tensorflow.google.cn/?hl=en)。如果您有改进此翻译的建议, 请提交 pull request 到 [tensorflow/docs](https://github.com/tensorflow/docs) GitHub 仓库。要志愿地撰写或者审核译文,请加入 [docs-zh-cn@tensorflow.org Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs-zh-cn)。
*回归 (regression)* 问题中,我们的目的是预测出如价格或概率这样连续值的输出。相对于*分类(classification)* 问题,*分类(classification)* 的目的是从一系列的分类出选择出一个分类 (如,给出一张包含苹果或橘子的图片,识别出图片中是哪种水果)。
本 notebook 使用经典的 [Auto MPG](https://archive.ics.uci.edu/ml/datasets/auto+mpg) 数据集,构建了一个用来预测 70 年代末到 80 年代初汽车燃油效率的模型。为了做到这一点,我们将为该模型提供许多那个时期的汽车描述。这个描述包含:气缸数,排量,马力以及重量。
本示例使用 [`tf.keras`](https://tensorflow.google.cn/api_docs/python/tf/keras) API相关细节请参阅 [本指南](https://tensorflow.google.cn/guide/keras)。
```py
# 使用 seaborn 绘制矩阵图 (pairplot)
pip install -q seaborn
```
```py
WARNING: You are using pip version 20.2.2; however, version 20.2.3 is available.
You should consider upgrading via the '/tmpfs/src/tf_docs_env/bin/python -m pip install --upgrade pip' command.
```
```py
import pathlib
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
print(tf.__version__)
```
```py
2.3.0
```
## Auto MPG 数据集
该数据集可以从 [UCI 机器学习库](https://archive.ics.uci.edu/ml/) 中获取.
### 获取数据
首先下载数据集。
```py
dataset_path = keras.utils.get_file("auto-mpg.data", "http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data")
dataset_path
```
```py
Downloading data from http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data
32768/30286 [================================] - 0s 1us/step
'/home/kbuilder/.keras/datasets/auto-mpg.data'
```
使用 pandas 导入数据集。
```py
column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight',
'Acceleration', 'Model Year', 'Origin']
raw_dataset = pd.read_csv(dataset_path, names=column_names,
na_values = "?", comment='\t',
sep=" ", skipinitialspace=True)
dataset = raw_dataset.copy()
dataset.tail()
```
<devsite-iframe><iframe src="/tutorials/keras/regression_c2c7b23a4a5727b4a9181f216946c4e868a70ec07f79be23388ba93b29ea1a47.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe>
### 数据清洗
数据集中包括一些未知值。
```py
dataset.isna().sum()
```
```py
MPG 0
Cylinders 0
Displacement 0
Horsepower 6
Weight 0
Acceleration 0
Model Year 0
Origin 0
dtype: int64
```
为了保证这个初始示例的简单性,删除这些行。
```py
dataset = dataset.dropna()
```
`"Origin"` 列实际上代表分类,而不仅仅是一个数字。所以把它转换为独热码 one-hot:
```py
origin = dataset.pop('Origin')
```
```py
dataset['USA'] = (origin == 1)*1.0
dataset['Europe'] = (origin == 2)*1.0
dataset['Japan'] = (origin == 3)*1.0
dataset.tail()
```
### 拆分训练数据集和测试数据集
现在需要将数据集拆分为一个训练数据集和一个测试数据集。
我们最后将使用测试数据集对模型进行评估。
```py
train_dataset = dataset.sample(frac=0.8,random_state=0)
test_dataset = dataset.drop(train_dataset.index)
```
### 数据检查
快速查看训练集中几对列的联合分布。
```py
sns.pairplot(train_dataset[["MPG", "Cylinders", "Displacement", "Weight"]], diag_kind="kde")
```
```py
<seaborn.axisgrid.PairGrid at 0x7f708ca93e80>
```
![png](img/4a4c68a2d8914e8b1b75bed4a9b81a5b.png)
也可以查看总体的数据统计:
```py
train_stats = train_dataset.describe()
train_stats.pop("MPG")
train_stats = train_stats.transpose()
train_stats
```
### 从标签中分离特征
将特征值从目标值或者"标签"中分离。 这个标签是你使用训练模型进行预测的值。
```py
train_labels = train_dataset.pop('MPG')
test_labels = test_dataset.pop('MPG')
```
### 数据规范化
再次审视下上面的 `train_stats` 部分,并注意每个特征的范围有什么不同。
使用不同的尺度和范围对特征归一化是好的实践。尽管模型*可能* 在没有特征归一化的情况下收敛,它会使得模型训练更加复杂,并会造成生成的模型依赖输入所使用的单位选择。
注意:尽管我们仅仅从训练集中有意生成这些统计数据,但是这些统计信息也会用于归一化的测试数据集。我们需要这样做,将测试数据集放入到与已经训练过的模型相同的分布中。
```py
def norm(x):
return (x - train_stats['mean']) / train_stats['std']
normed_train_data = norm(train_dataset)
normed_test_data = norm(test_dataset)
```
我们将会使用这个已经归一化的数据来训练模型。
警告: 用于归一化输入的数据统计(均值和标准差)需要反馈给模型从而应用于任何其他数据,以及我们之前所获得独热码。这些数据包含测试数据集以及生产环境中所使用的实时数据。
## 模型
### 构建模型
让我们来构建我们自己的模型。这里,我们将会使用一个“顺序”模型,其中包含两个紧密相连的隐藏层,以及返回单个、连续值得输出层。模型的构建步骤包含于一个名叫 'build_model' 的函数中,稍后我们将会创建第二个模型。 两个密集连接的隐藏层。
```py
def build_model():
model = keras.Sequential([
layers.Dense(64, activation='relu', input_shape=[len(train_dataset.keys())]),
layers.Dense(64, activation='relu'),
layers.Dense(1)
])
optimizer = tf.keras.optimizers.RMSprop(0.001)
model.compile(loss='mse',
optimizer=optimizer,
metrics=['mae', 'mse'])
return model
```
```py
model = build_model()
```
### 检查模型
使用 `.summary` 方法来打印该模型的简单描述。
```py
model.summary()
```
```py
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 64) 640
_________________________________________________________________
dense_1 (Dense) (None, 64) 4160
_________________________________________________________________
dense_2 (Dense) (None, 1) 65
=================================================================
Total params: 4,865
Trainable params: 4,865
Non-trainable params: 0
_________________________________________________________________
```
现在试用下这个模型。从训练数据中批量获取10条例子并对这些例子调用 `model.predict`
```py
example_batch = normed_train_data[:10]
example_result = model.predict(example_batch)
example_result
```
```py
array([[0.15074062],
[0.0973136 ],
[0.17310914],
[0.08873479],
[0.52456 ],
[0.05311462],
[0.49406645],
[0.04333409],
[0.12005241],
[0.6703117 ]], dtype=float32)
```
它似乎在工作,并产生了预期的形状和类型的结果
### 训练模型
对模型进行 1000 个周期的训练,并在 `history` 对象中记录训练和验证的准确性。
```py
# 通过为每个完成的时期打印一个点来显示训练进度
class PrintDot(keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs):
if epoch % 100 == 0: print('')
print('.', end='')
EPOCHS = 1000
history = model.fit(
normed_train_data, train_labels,
epochs=EPOCHS, validation_split = 0.2, verbose=0,
callbacks=[PrintDot()])
```
```py
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
```
使用 `history` 对象中存储的统计信息可视化模型的训练进度。
```py
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
hist.tail()
```
```py
def plot_history(history):
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
plt.figure()
plt.xlabel('Epoch')
plt.ylabel('Mean Abs Error [MPG]')
plt.plot(hist['epoch'], hist['mae'],
label='Train Error')
plt.plot(hist['epoch'], hist['val_mae'],
label = 'Val Error')
plt.ylim([0,5])
plt.legend()
plt.figure()
plt.xlabel('Epoch')
plt.ylabel('Mean Square Error [$MPG^2$]')
plt.plot(hist['epoch'], hist['mse'],
label='Train Error')
plt.plot(hist['epoch'], hist['val_mse'],
label = 'Val Error')
plt.ylim([0,20])
plt.legend()
plt.show()
plot_history(history)
```
![png](img/7fe4fe0b14735050369dc31f05672d65.png)
![png](img/29af7886a5834acb3b056b86d97b4128.png)
该图表显示在约 100 个 epochs 之后误差非但没有改进,反而出现恶化。 让我们更新 `model.fit` 调用,当验证值没有提高上是自动停止训练。 我们将使用一个 *EarlyStopping callback* 来测试每个 epoch 的训练条件。如果经过一定数量的 epochs 后没有改进,则自动停止训练。
你可以从[这里](https://tensorflow.google.cn/versions/master/api_docs/python/tf/keras/callbacks/EarlyStopping)学习到更多的回调。
```py
model = build_model()
# patience 值用来检查改进 epochs 的数量
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)
history = model.fit(normed_train_data, train_labels, epochs=EPOCHS,
validation_split = 0.2, verbose=0, callbacks=[early_stop, PrintDot()])
plot_history(history)
```
```py
....................................................................................................
...........................
```
![png](img/253f679c0d56ad236d24246ddb70d466.png)
![png](img/0f98889f249aed7e8f8f5e90e5432e08.png)
如图所示,验证集中的平均的误差通常在 +/- 2 MPG 左右。 这个结果好么? 我们将决定权留给你。
让我们看看通过使用 **测试集** 来泛化模型的效果如何,我们在训练模型时没有使用测试集。这告诉我们,当我们在现实世界中使用这个模型时,我们可以期望它预测得有多好。
```py
loss, mae, mse = model.evaluate(normed_test_data, test_labels, verbose=2)
print("Testing set Mean Abs Error: {:5.2f} MPG".format(mae))
```
```py
3/3 - 0s - loss: 5.9941 - mae: 1.8809 - mse: 5.9941
Testing set Mean Abs Error: 1.88 MPG
```
### 做预测
最后,使用测试集中的数据预测 MPG 值:
```py
test_predictions = model.predict(normed_test_data).flatten()
plt.scatter(test_labels, test_predictions)
plt.xlabel('True Values [MPG]')
plt.ylabel('Predictions [MPG]')
plt.axis('equal')
plt.axis('square')
plt.xlim([0,plt.xlim()[1]])
plt.ylim([0,plt.ylim()[1]])
_ = plt.plot([-100, 100], [-100, 100])
```
![png](img/54c9e1f17ab75ca37c6360c3e5230475.png)
这看起来我们的模型预测得相当好。我们来看下误差分布。
```py
error = test_predictions - test_labels
plt.hist(error, bins = 25)
plt.xlabel("Prediction Error [MPG]")
_ = plt.ylabel("Count")
```
![png](img/25091cb1e90c92e9948c6c6cb9d0238b.png)
它不是完全的高斯分布,但我们可以推断出,这是因为样本的数量很小所导致的。
## 结论
本笔记本 (notebook) 介绍了一些处理回归问题的技术。
* 均方误差MSE是用于回归问题的常见损失函数分类问题中使用不同的损失函数
* 类似的,用于回归的评估指标与分类不同。 常见的回归指标是平均绝对误差MAE
* 当数字输入数据特征的值存在不同范围时,每个特征应独立缩放到相同范围。
* 如果训练数据不多,一种方法是选择隐藏层较少的小网络,以避免过度拟合。
* 早期停止是一种防止过度拟合的有效技术。
```py
# MIT License
#
# Copyright (c) 2017 François Chollet
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
```

View File

@@ -1,915 +0,0 @@
# Overfit and underfit
> 原文:[https://tensorflow.google.cn/tutorials/keras/overfit_and_underfit](https://tensorflow.google.cn/tutorials/keras/overfit_and_underfit)
As always, the code in this example will use the [`tf.keras`](https://tensorflow.google.cn/api_docs/python/tf/keras) API, which you can learn more about in the TensorFlow [Keras guide](https://tensorflow.google.cn/guide/keras).
In both of the previous examples—[classifying text](https://tensorflow.google.cn/tutorials/keras/text_classification_with_hub) and [predicting fuel efficiency](https://tensorflow.google.cn/tutorials/keras/regression) — we saw that the accuracy of our model on the validation data would peak after training for a number of epochs, and would then stagnate or start decreasing.
In other words, our model would *overfit* to the training data. Learning how to deal with overfitting is important. Although it's often possible to achieve high accuracy on the *training set*, what we really want is to develop models that generalize well to a *testing set* (or data they haven't seen before).
The opposite of overfitting is *underfitting*. Underfitting occurs when there is still room for improvement on the train data. This can happen for a number of reasons: If the model is not powerful enough, is over-regularized, or has simply not been trained long enough. This means the network has not learned the relevant patterns in the training data.
If you train for too long though, the model will start to overfit and learn patterns from the training data that don't generalize to the test data. We need to strike a balance. Understanding how to train for an appropriate number of epochs as we'll explore below is a useful skill.
To prevent overfitting, the best solution is to use more complete training data. The dataset should cover the full range of inputs that the model is expected to handle. Additional data may only be useful if it covers new and interesting cases.
A model trained on more complete data will naturally generalize better. When that is no longer possible, the next best solution is to use techniques like regularization. These place constraints on the quantity and type of information your model can store. If a network can only afford to memorize a small number of patterns, the optimization process will force it to focus on the most prominent patterns, which have a better chance of generalizing well.
In this notebook, we'll explore several common regularization techniques, and use them to improve on a classification model.
## Setup
Before getting started, import the necessary packages:
```py
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras import regularizers
print(tf.__version__)
```
```py
2.3.1
```
```py
!pip install -q git+https://github.com/tensorflow/docs
import tensorflow_docs as tfdocs
import tensorflow_docs.modeling
import tensorflow_docs.plots
```
```py
from IPython import display
from matplotlib import pyplot as plt
import numpy as np
import pathlib
import shutil
import tempfile
```
```py
logdir = pathlib.Path(tempfile.mkdtemp())/"tensorboard_logs"
shutil.rmtree(logdir, ignore_errors=True)
```
## The Higgs Dataset
The goal of this tutorial is not to do particle physics, so don't dwell on the details of the dataset. It contains 11000000 examples, each with 28 features, and a binary class label.
```py
gz = tf.keras.utils.get_file('HIGGS.csv.gz', 'http://mlphysics.ics.uci.edu/data/higgs/HIGGS.csv.gz')
```
```py
Downloading data from http://mlphysics.ics.uci.edu/data/higgs/HIGGS.csv.gz
2816409600/2816407858 [==============================] - 230s 0us/step
```
```py
FEATURES = 28
```
The [`tf.data.experimental.CsvDataset`](https://tensorflow.google.cn/api_docs/python/tf/data/experimental/CsvDataset) class can be used to read csv records directly from a gzip file with no intermediate decompression step.
```py
ds = tf.data.experimental.CsvDataset(gz,[float(),]*(FEATURES+1), compression_type="GZIP")
```
That csv reader class returns a list of scalars for each record. The following function repacks that list of scalars into a (feature_vector, label) pair.
```py
def pack_row(*row):
label = row[0]
features = tf.stack(row[1:],1)
return features, label
```
TensorFlow is most efficient when operating on large batches of data.
So instead of repacking each row individually make a new `Dataset` that takes batches of 10000-examples, applies the `pack_row` function to each batch, and then splits the batches back up into individual records:
```py
packed_ds = ds.batch(10000).map(pack_row).unbatch()
```
Have a look at some of the records from this new `packed_ds`.
The features are not perfectly normalized, but this is sufficient for this tutorial.
```py
for features,label in packed_ds.batch(1000).take(1):
print(features[0])
plt.hist(features.numpy().flatten(), bins = 101)
```
```py
tf.Tensor(
[ 0.8692932 -0.6350818 0.22569026 0.32747006 -0.6899932 0.75420225
-0.24857314 -1.0920639 0\. 1.3749921 -0.6536742 0.9303491
1.1074361 1.1389043 -1.5781983 -1.0469854 0\. 0.65792954
-0.01045457 -0.04576717 3.1019614 1.35376 0.9795631 0.97807616
0.92000484 0.72165745 0.98875093 0.87667835], shape=(28,), dtype=float32)
```
![png](img/b4bcda4ec74a98071e75941c07503a6c.png)
To keep this tutorial relatively short use just the first 1000 samples for validation, and the next 10 000 for training:
```py
N_VALIDATION = int(1e3)
N_TRAIN = int(1e4)
BUFFER_SIZE = int(1e4)
BATCH_SIZE = 500
STEPS_PER_EPOCH = N_TRAIN//BATCH_SIZE
```
The [`Dataset.skip`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#skip) and [`Dataset.take`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#take) methods make this easy.
At the same time, use the [`Dataset.cache`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#cache) method to ensure that the loader doesn't need to re-read the data from the file on each epoch:
```py
validate_ds = packed_ds.take(N_VALIDATION).cache()
train_ds = packed_ds.skip(N_VALIDATION).take(N_TRAIN).cache()
```
```py
train_ds
```
```py
<CacheDataset shapes: ((28,), ()), types: (tf.float32, tf.float32)>
```
These datasets return individual examples. Use the `.batch` method to create batches of an appropriate size for training. Before batching also remember to `.shuffle` and `.repeat` the training set.
```py
validate_ds = validate_ds.batch(BATCH_SIZE)
train_ds = train_ds.shuffle(BUFFER_SIZE).repeat().batch(BATCH_SIZE)
```
## Demonstrate overfitting
The simplest way to prevent overfitting is to start with a small model: A model with a small number of learnable parameters (which is determined by the number of layers and the number of units per layer). In deep learning, the number of learnable parameters in a model is often referred to as the model's "capacity".
Intuitively, a model with more parameters will have more "memorization capacity" and therefore will be able to easily learn a perfect dictionary-like mapping between training samples and their targets, a mapping without any generalization power, but this would be useless when making predictions on previously unseen data.
Always keep this in mind: deep learning models tend to be good at fitting to the training data, but the real challenge is generalization, not fitting.
On the other hand, if the network has limited memorization resources, it will not be able to learn the mapping as easily. To minimize its loss, it will have to learn compressed representations that have more predictive power. At the same time, if you make your model too small, it will have difficulty fitting to the training data. There is a balance between "too much capacity" and "not enough capacity".
Unfortunately, there is no magical formula to determine the right size or architecture of your model (in terms of the number of layers, or the right size for each layer). You will have to experiment using a series of different architectures.
To find an appropriate model size, it's best to start with relatively few layers and parameters, then begin increasing the size of the layers or adding new layers until you see diminishing returns on the validation loss.
Start with a simple model using only [`layers.Dense`](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/Dense) as a baseline, then create larger versions, and compare them.
### Training procedure
Many models train better if you gradually reduce the learning rate during training. Use [`optimizers.schedules`](https://tensorflow.google.cn/api_docs/python/tf/keras/optimizers/schedules) to reduce the learning rate over time:
```py
lr_schedule = tf.keras.optimizers.schedules.InverseTimeDecay(
0.001,
decay_steps=STEPS_PER_EPOCH*1000,
decay_rate=1,
staircase=False)
def get_optimizer():
return tf.keras.optimizers.Adam(lr_schedule)
```
The code above sets a [`schedules.InverseTimeDecay`](https://tensorflow.google.cn/api_docs/python/tf/keras/optimizers/schedules/InverseTimeDecay) to hyperbolically decrease the learning rate to 1/2 of the base rate at 1000 epochs, 1/3 at 2000 epochs and so on.
```py
step = np.linspace(0,100000)
lr = lr_schedule(step)
plt.figure(figsize = (8,6))
plt.plot(step/STEPS_PER_EPOCH, lr)
plt.ylim([0,max(plt.ylim())])
plt.xlabel('Epoch')
_ = plt.ylabel('Learning Rate')
```
![png](img/1d906c8d5397ad3e918d2a91fcfbb78e.png)
Each model in this tutorial will use the same training configuration. So set these up in a reusable way, starting with the list of callbacks.
The training for this tutorial runs for many short epochs. To reduce the logging noise use the `tfdocs.EpochDots` which simply prints a `.` for each epoch, and a full set of metrics every 100 epochs.
Next include [`callbacks.EarlyStopping`](https://tensorflow.google.cn/api_docs/python/tf/keras/callbacks/EarlyStopping) to avoid long and unnecessary training times. Note that this callback is set to monitor the `val_binary_crossentropy`, not the `val_loss`. This difference will be important later.
Use [`callbacks.TensorBoard`](https://tensorflow.google.cn/api_docs/python/tf/keras/callbacks/TensorBoard) to generate TensorBoard logs for the training.
```py
def get_callbacks(name):
return [
tfdocs.modeling.EpochDots(),
tf.keras.callbacks.EarlyStopping(monitor='val_binary_crossentropy', patience=200),
tf.keras.callbacks.TensorBoard(logdir/name),
]
```
Similarly each model will use the same [`Model.compile`](https://tensorflow.google.cn/api_docs/python/tf/keras/Model#compile) and [`Model.fit`](https://tensorflow.google.cn/api_docs/python/tf/keras/Model#fit) settings:
```py
def compile_and_fit(model, name, optimizer=None, max_epochs=10000):
if optimizer is None:
optimizer = get_optimizer()
model.compile(optimizer=optimizer,
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=[
tf.keras.losses.BinaryCrossentropy(
from_logits=True, name='binary_crossentropy'),
'accuracy'])
model.summary()
history = model.fit(
train_ds,
steps_per_epoch = STEPS_PER_EPOCH,
epochs=max_epochs,
validation_data=validate_ds,
callbacks=get_callbacks(name),
verbose=0)
return history
```
### Tiny model
Start by training a model:
```py
tiny_model = tf.keras.Sequential([
layers.Dense(16, activation='elu', input_shape=(FEATURES,)),
layers.Dense(1)
])
```
```py
size_histories = {}
```
```py
size_histories['Tiny'] = compile_and_fit(tiny_model, 'sizes/Tiny')
```
```py
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 16) 464
_________________________________________________________________
dense_1 (Dense) (None, 1) 17
=================================================================
Total params: 481
Trainable params: 481
Non-trainable params: 0
_________________________________________________________________
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/ops/summary_ops_v2.py:1277: stop (from tensorflow.python.eager.profiler) is deprecated and will be removed after 2020-07-01.
Instructions for updating:
use `tf.profiler.experimental.stop` instead.
WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0032s vs `on_train_batch_end` time: 0.0255s). Check your callbacks.
Epoch: 0, accuracy:0.5092, binary_crossentropy:0.7752, loss:0.7752, val_accuracy:0.5110, val_binary_crossentropy:0.7376, val_loss:0.7376,
....................................................................................................
Epoch: 100, accuracy:0.6028, binary_crossentropy:0.6251, loss:0.6251, val_accuracy:0.5680, val_binary_crossentropy:0.6271, val_loss:0.6271,
....................................................................................................
Epoch: 200, accuracy:0.6231, binary_crossentropy:0.6137, loss:0.6137, val_accuracy:0.5920, val_binary_crossentropy:0.6146, val_loss:0.6146,
....................................................................................................
Epoch: 300, accuracy:0.6356, binary_crossentropy:0.6038, loss:0.6038, val_accuracy:0.6190, val_binary_crossentropy:0.6051, val_loss:0.6051,
....................................................................................................
Epoch: 400, accuracy:0.6470, binary_crossentropy:0.5963, loss:0.5963, val_accuracy:0.6330, val_binary_crossentropy:0.5968, val_loss:0.5968,
....................................................................................................
Epoch: 500, accuracy:0.6619, binary_crossentropy:0.5909, loss:0.5909, val_accuracy:0.6280, val_binary_crossentropy:0.5939, val_loss:0.5939,
....................................................................................................
Epoch: 600, accuracy:0.6618, binary_crossentropy:0.5872, loss:0.5872, val_accuracy:0.6630, val_binary_crossentropy:0.5910, val_loss:0.5910,
....................................................................................................
Epoch: 700, accuracy:0.6655, binary_crossentropy:0.5847, loss:0.5847, val_accuracy:0.6290, val_binary_crossentropy:0.5940, val_loss:0.5940,
....................................................................................................
Epoch: 800, accuracy:0.6683, binary_crossentropy:0.5819, loss:0.5819, val_accuracy:0.6510, val_binary_crossentropy:0.5908, val_loss:0.5908,
....................................................................................................
Epoch: 900, accuracy:0.6722, binary_crossentropy:0.5797, loss:0.5797, val_accuracy:0.6620, val_binary_crossentropy:0.5907, val_loss:0.5907,
....................................................................................................
Epoch: 1000, accuracy:0.6761, binary_crossentropy:0.5779, loss:0.5779, val_accuracy:0.6470, val_binary_crossentropy:0.5910, val_loss:0.5910,
...............................
```
Now check how the model did:
```py
plotter = tfdocs.plots.HistoryPlotter(metric = 'binary_crossentropy', smoothing_std=10)
plotter.plot(size_histories)
plt.ylim([0.5, 0.7])
```
```py
(0.5, 0.7)
```
![png](img/f865018e54d4c67ed60313c72d71e99c.png)
### Small model
To see if you can beat the performance of the small model, progressively train some larger models.
Try two hidden layers with 16 units each:
```py
small_model = tf.keras.Sequential([
# `input_shape` is only required here so that `.summary` works.
layers.Dense(16, activation='elu', input_shape=(FEATURES,)),
layers.Dense(16, activation='elu'),
layers.Dense(1)
])
```
```py
size_histories['Small'] = compile_and_fit(small_model, 'sizes/Small')
```
```py
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_2 (Dense) (None, 16) 464
_________________________________________________________________
dense_3 (Dense) (None, 16) 272
_________________________________________________________________
dense_4 (Dense) (None, 1) 17
=================================================================
Total params: 753
Trainable params: 753
Non-trainable params: 0
_________________________________________________________________
WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0037s vs `on_train_batch_end` time: 0.0530s). Check your callbacks.
Epoch: 0, accuracy:0.5029, binary_crossentropy:0.7257, loss:0.7257, val_accuracy:0.4720, val_binary_crossentropy:0.6927, val_loss:0.6927,
....................................................................................................
Epoch: 100, accuracy:0.6153, binary_crossentropy:0.6185, loss:0.6185, val_accuracy:0.6290, val_binary_crossentropy:0.6112, val_loss:0.6112,
....................................................................................................
Epoch: 200, accuracy:0.6551, binary_crossentropy:0.5940, loss:0.5940, val_accuracy:0.6540, val_binary_crossentropy:0.5941, val_loss:0.5941,
....................................................................................................
Epoch: 300, accuracy:0.6678, binary_crossentropy:0.5824, loss:0.5824, val_accuracy:0.6680, val_binary_crossentropy:0.5904, val_loss:0.5904,
....................................................................................................
Epoch: 400, accuracy:0.6731, binary_crossentropy:0.5754, loss:0.5754, val_accuracy:0.6630, val_binary_crossentropy:0.5872, val_loss:0.5872,
....................................................................................................
Epoch: 500, accuracy:0.6836, binary_crossentropy:0.5679, loss:0.5679, val_accuracy:0.6740, val_binary_crossentropy:0.5834, val_loss:0.5834,
....................................................................................................
Epoch: 600, accuracy:0.6839, binary_crossentropy:0.5617, loss:0.5617, val_accuracy:0.6760, val_binary_crossentropy:0.5849, val_loss:0.5849,
....................................................................................................
```
### Medium model
Now try 3 hidden layers with 64 units each:
```py
medium_model = tf.keras.Sequential([
layers.Dense(64, activation='elu', input_shape=(FEATURES,)),
layers.Dense(64, activation='elu'),
layers.Dense(64, activation='elu'),
layers.Dense(1)
])
```
And train the model using the same data:
```py
size_histories['Medium'] = compile_and_fit(medium_model, "sizes/Medium")
```
```py
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_5 (Dense) (None, 64) 1856
_________________________________________________________________
dense_6 (Dense) (None, 64) 4160
_________________________________________________________________
dense_7 (Dense) (None, 64) 4160
_________________________________________________________________
dense_8 (Dense) (None, 1) 65
=================================================================
Total params: 10,241
Trainable params: 10,241
Non-trainable params: 0
_________________________________________________________________
WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0039s vs `on_train_batch_end` time: 0.0548s). Check your callbacks.
Epoch: 0, accuracy:0.5027, binary_crossentropy:0.6936, loss:0.6936, val_accuracy:0.5150, val_binary_crossentropy:0.6758, val_loss:0.6758,
....................................................................................................
Epoch: 100, accuracy:0.7075, binary_crossentropy:0.5382, loss:0.5382, val_accuracy:0.6670, val_binary_crossentropy:0.6027, val_loss:0.6027,
....................................................................................................
Epoch: 200, accuracy:0.7705, binary_crossentropy:0.4498, loss:0.4498, val_accuracy:0.6200, val_binary_crossentropy:0.6833, val_loss:0.6833,
...................................................................
```
### Large model
As an exercise, you can create an even larger model, and see how quickly it begins overfitting. Next, let's add to this benchmark a network that has much more capacity, far more than the problem would warrant:
```py
large_model = tf.keras.Sequential([
layers.Dense(512, activation='elu', input_shape=(FEATURES,)),
layers.Dense(512, activation='elu'),
layers.Dense(512, activation='elu'),
layers.Dense(512, activation='elu'),
layers.Dense(1)
])
```
And, again, train the model using the same data:
```py
size_histories['large'] = compile_and_fit(large_model, "sizes/large")
```
```py
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_9 (Dense) (None, 512) 14848
_________________________________________________________________
dense_10 (Dense) (None, 512) 262656
_________________________________________________________________
dense_11 (Dense) (None, 512) 262656
_________________________________________________________________
dense_12 (Dense) (None, 512) 262656
_________________________________________________________________
dense_13 (Dense) (None, 1) 513
=================================================================
Total params: 803,329
Trainable params: 803,329
Non-trainable params: 0
_________________________________________________________________
WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0041s vs `on_train_batch_end` time: 0.0613s). Check your callbacks.
Epoch: 0, accuracy:0.5072, binary_crossentropy:0.8249, loss:0.8249, val_accuracy:0.4810, val_binary_crossentropy:0.6884, val_loss:0.6884,
....................................................................................................
Epoch: 100, accuracy:1.0000, binary_crossentropy:0.0025, loss:0.0025, val_accuracy:0.6590, val_binary_crossentropy:1.8242, val_loss:1.8242,
....................................................................................................
Epoch: 200, accuracy:1.0000, binary_crossentropy:0.0001, loss:0.0001, val_accuracy:0.6590, val_binary_crossentropy:2.5014, val_loss:2.5014,
......................
```
### Plot the training and validation losses
The solid lines show the training loss, and the dashed lines show the validation loss (remember: a lower validation loss indicates a better model).
While building a larger model gives it more power, if this power is not constrained somehow it can easily overfit to the training set.
In this example, typically, only the `"Tiny"` model manages to avoid overfitting altogether, and each of the larger models overfit the data more quickly. This becomes so severe for the `"large"` model that you need to switch the plot to a log-scale to really see what's happening.
This is apparent if you plot and compare the validation metrics to the training metrics.
* It's normal for there to be a small difference.
* If both metrics are moving in the same direction, everything is fine.
* If the validation metric begins to stagnate while the training metric continues to improve, you are probably close to overfitting.
* If the validation metric is going in the wrong direction, the model is clearly overfitting.
```py
plotter.plot(size_histories)
a = plt.xscale('log')
plt.xlim([5, max(plt.xlim())])
plt.ylim([0.5, 0.7])
plt.xlabel("Epochs [Log Scale]")
```
```py
Text(0.5, 0, 'Epochs [Log Scale]')
```
![png](img/4c173dbd57644fa57c04cf1d62ca75e4.png)
**Note:** All the above training runs used the [`callbacks.EarlyStopping`](https://tensorflow.google.cn/api_docs/python/tf/keras/callbacks/EarlyStopping) to end the training once it was clear the model was not making progress.
### View in TensorBoard
These models all wrote TensorBoard logs during training.
Open an embedded TensorBoard viewer inside a notebook:
```py
# Load the TensorBoard notebook extension
%load_ext tensorboard
# Open an embedded TensorBoard viewer
%tensorboard --logdir {logdir}/sizes
```
You can view the [results of a previous run](https://tensorboard.dev/experiment/vW7jmmF9TmKmy3rbheMQpw/#scalars&_smoothingWeight=0.97) of this notebook on [TensorBoard.dev](https://tensorboard.dev/).
TensorBoard.dev is a managed experience for hosting, tracking, and sharing ML experiments with everyone.
It's also included in an `<iframe>` for convenience:
```py
display.IFrame(
src="https://tensorboard.dev/experiment/vW7jmmF9TmKmy3rbheMQpw/#scalars&_smoothingWeight=0.97",
width="100%", height="800px")
```
<devsite-iframe><iframe src="/tutorials/keras/overfit_and_underfit_b2e3abde2baf0d401dd70acbfc9be7edb69d49549b568d7034c72e54ebb5f379.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe>
If you want to share TensorBoard results you can upload the logs to [TensorBoard.dev](https://tensorboard.dev/) by copying the following into a code-cell.
**Note:** This step requires a Google account.
```py
tensorboard dev upload --logdir {logdir}/sizes
```
**Caution:** This command does not terminate. It's designed to continuously upload the results of long-running experiments. Once your data is uploaded you need to stop it using the "interrupt execution" option in your notebook tool.
## Strategies to prevent overfitting
Before getting into the content of this section copy the training logs from the `"Tiny"` model above, to use as a baseline for comparison.
```py
shutil.rmtree(logdir/'regularizers/Tiny', ignore_errors=True)
shutil.copytree(logdir/'sizes/Tiny', logdir/'regularizers/Tiny')
```
```py
PosixPath('/tmp/tmp9n203dpq/tensorboard_logs/regularizers/Tiny')
```
```py
regularizer_histories = {}
regularizer_histories['Tiny'] = size_histories['Tiny']
```
### Add weight regularization
You may be familiar with Occam's Razor principle: given two explanations for something, the explanation most likely to be correct is the "simplest" one, the one that makes the least amount of assumptions. This also applies to the models learned by neural networks: given some training data and a network architecture, there are multiple sets of weights values (multiple models) that could explain the data, and simpler models are less likely to overfit than complex ones.
A "simple model" in this context is a model where the distribution of parameter values has less entropy (or a model with fewer parameters altogether, as we saw in the section above). Thus a common way to mitigate overfitting is to put constraints on the complexity of a network by forcing its weights only to take small values, which makes the distribution of weight values more "regular". This is called "weight regularization", and it is done by adding to the loss function of the network a cost associated with having large weights. This cost comes in two flavors:
* [L1 regularization](https://developers.google.cn/machine-learning/glossary/#L1_regularization), where the cost added is proportional to the absolute value of the weights coefficients (i.e. to what is called the "L1 norm" of the weights).
* [L2 regularization](https://developers.google.cn/machine-learning/glossary/#L2_regularization), where the cost added is proportional to the square of the value of the weights coefficients (i.e. to what is called the squared "L2 norm" of the weights). L2 regularization is also called weight decay in the context of neural networks. Don't let the different name confuse you: weight decay is mathematically the exact same as L2 regularization.
L1 regularization pushes weights towards exactly zero encouraging a sparse model. L2 regularization will penalize the weights parameters without making them sparse since the penalty goes to zero for small weights. one reason why L2 is more common.
In [`tf.keras`](https://tensorflow.google.cn/api_docs/python/tf/keras), weight regularization is added by passing weight regularizer instances to layers as keyword arguments. Let's add L2 weight regularization now.
```py
l2_model = tf.keras.Sequential([
layers.Dense(512, activation='elu',
kernel_regularizer=regularizers.l2(0.001),
input_shape=(FEATURES,)),
layers.Dense(512, activation='elu',
kernel_regularizer=regularizers.l2(0.001)),
layers.Dense(512, activation='elu',
kernel_regularizer=regularizers.l2(0.001)),
layers.Dense(512, activation='elu',
kernel_regularizer=regularizers.l2(0.001)),
layers.Dense(1)
])
regularizer_histories['l2'] = compile_and_fit(l2_model, "regularizers/l2")
```
```py
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_14 (Dense) (None, 512) 14848
_________________________________________________________________
dense_15 (Dense) (None, 512) 262656
_________________________________________________________________
dense_16 (Dense) (None, 512) 262656
_________________________________________________________________
dense_17 (Dense) (None, 512) 262656
_________________________________________________________________
dense_18 (Dense) (None, 1) 513
=================================================================
Total params: 803,329
Trainable params: 803,329
Non-trainable params: 0
_________________________________________________________________
WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0040s vs `on_train_batch_end` time: 0.0613s). Check your callbacks.
Epoch: 0, accuracy:0.5087, binary_crossentropy:0.8160, loss:2.3363, val_accuracy:0.4770, val_binary_crossentropy:0.6979, val_loss:2.1441,
....................................................................................................
Epoch: 100, accuracy:0.6607, binary_crossentropy:0.5920, loss:0.6163, val_accuracy:0.6530, val_binary_crossentropy:0.5831, val_loss:0.6076,
....................................................................................................
Epoch: 200, accuracy:0.6820, binary_crossentropy:0.5789, loss:0.6033, val_accuracy:0.6690, val_binary_crossentropy:0.5799, val_loss:0.6044,
....................................................................................................
Epoch: 300, accuracy:0.6865, binary_crossentropy:0.5696, loss:0.5947, val_accuracy:0.6360, val_binary_crossentropy:0.5839, val_loss:0.6088,
....................................................................................................
Epoch: 400, accuracy:0.6908, binary_crossentropy:0.5639, loss:0.5908, val_accuracy:0.6840, val_binary_crossentropy:0.5898, val_loss:0.6167,
..........................................
```
`l2(0.001)` means that every coefficient in the weight matrix of the layer will add `0.001 * weight_coefficient_value**2` to the total **loss** of the network.
That is why we're monitoring the `binary_crossentropy` directly. Because it doesn't have this regularization component mixed in.
So, that same `"Large"` model with an `L2` regularization penalty performs much better:
```py
plotter.plot(regularizer_histories)
plt.ylim([0.5, 0.7])
```
```py
(0.5, 0.7)
```
![png](img/87e59b9663f1f875cba8bbc04b3ec8d7.png)
As you can see, the `"L2"` regularized model is now much more competitive with the the `"Tiny"` model. This `"L2"` model is also much more resistant to overfitting than the `"Large"` model it was based on despite having the same number of parameters.
#### More info
There are two important things to note about this sort of regularization.
**First:** if you are writing your own training loop, then you need to be sure to ask the model for its regularization losses.
```py
result = l2_model(features)
regularization_loss=tf.add_n(l2_model.losses)
```
**Second:** This implementation works by adding the weight penalties to the model's loss, and then applying a standard optimization procedure after that.
There is a second approach that instead only runs the optimizer on the raw loss, and then while applying the calculated step the optimizer also applies some weight decay. This "Decoupled Weight Decay" is seen in optimizers like `optimizers.FTRL` and [`optimizers.AdamW`](https://tensorflow.google.cn/addons/api_docs/python/tfa/optimizers/AdamW).
### Add dropout
Dropout is one of the most effective and most commonly used regularization techniques for neural networks, developed by Hinton and his students at the University of Toronto.
The intuitive explanation for dropout is that because individual nodes in the network cannot rely on the output of the others, each node must output features that are useful on their own.
Dropout, applied to a layer, consists of randomly "dropping out" (i.e. set to zero) a number of output features of the layer during training. Let's say a given layer would normally have returned a vector [0.2, 0.5, 1.3, 0.8, 1.1] for a given input sample during training; after applying dropout, this vector will have a few zero entries distributed at random, e.g. [0, 0.5, 1.3, 0, 1.1].
The "dropout rate" is the fraction of the features that are being zeroed-out; it is usually set between 0.2 and 0.5\. At test time, no units are dropped out, and instead the layer's output values are scaled down by a factor equal to the dropout rate, so as to balance for the fact that more units are active than at training time.
In [`tf.keras`](https://tensorflow.google.cn/api_docs/python/tf/keras) you can introduce dropout in a network via the Dropout layer, which gets applied to the output of layer right before.
Let's add two Dropout layers in our network to see how well they do at reducing overfitting:
```py
dropout_model = tf.keras.Sequential([
layers.Dense(512, activation='elu', input_shape=(FEATURES,)),
layers.Dropout(0.5),
layers.Dense(512, activation='elu'),
layers.Dropout(0.5),
layers.Dense(512, activation='elu'),
layers.Dropout(0.5),
layers.Dense(512, activation='elu'),
layers.Dropout(0.5),
layers.Dense(1)
])
regularizer_histories['dropout'] = compile_and_fit(dropout_model, "regularizers/dropout")
```
```py
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_19 (Dense) (None, 512) 14848
_________________________________________________________________
dropout (Dropout) (None, 512) 0
_________________________________________________________________
dense_20 (Dense) (None, 512) 262656
_________________________________________________________________
dropout_1 (Dropout) (None, 512) 0
_________________________________________________________________
dense_21 (Dense) (None, 512) 262656
_________________________________________________________________
dropout_2 (Dropout) (None, 512) 0
_________________________________________________________________
dense_22 (Dense) (None, 512) 262656
_________________________________________________________________
dropout_3 (Dropout) (None, 512) 0
_________________________________________________________________
dense_23 (Dense) (None, 1) 513
=================================================================
Total params: 803,329
Trainable params: 803,329
Non-trainable params: 0
_________________________________________________________________
WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0040s vs `on_train_batch_end` time: 0.0632s). Check your callbacks.
Epoch: 0, accuracy:0.5073, binary_crossentropy:0.7984, loss:0.7984, val_accuracy:0.5200, val_binary_crossentropy:0.6761, val_loss:0.6761,
....................................................................................................
Epoch: 100, accuracy:0.6576, binary_crossentropy:0.5965, loss:0.5965, val_accuracy:0.6730, val_binary_crossentropy:0.5833, val_loss:0.5833,
....................................................................................................
Epoch: 200, accuracy:0.6861, binary_crossentropy:0.5554, loss:0.5554, val_accuracy:0.6790, val_binary_crossentropy:0.5830, val_loss:0.5830,
....................................................................................................
Epoch: 300, accuracy:0.7280, binary_crossentropy:0.5102, loss:0.5102, val_accuracy:0.6860, val_binary_crossentropy:0.6088, val_loss:0.6088,
................
```
```py
plotter.plot(regularizer_histories)
plt.ylim([0.5, 0.7])
```
```py
(0.5, 0.7)
```
![png](img/b5a9ca25aab20c2b09a25fdab4c2b92b.png)
It's clear from this plot that both of these regularization approaches improve the behavior of the `"Large"` model. But this still doesn't beat even the `"Tiny"` baseline.
Next try them both, together, and see if that does better.
### Combined L2 + dropout
```py
combined_model = tf.keras.Sequential([
layers.Dense(512, kernel_regularizer=regularizers.l2(0.0001),
activation='elu', input_shape=(FEATURES,)),
layers.Dropout(0.5),
layers.Dense(512, kernel_regularizer=regularizers.l2(0.0001),
activation='elu'),
layers.Dropout(0.5),
layers.Dense(512, kernel_regularizer=regularizers.l2(0.0001),
activation='elu'),
layers.Dropout(0.5),
layers.Dense(512, kernel_regularizer=regularizers.l2(0.0001),
activation='elu'),
layers.Dropout(0.5),
layers.Dense(1)
])
regularizer_histories['combined'] = compile_and_fit(combined_model, "regularizers/combined")
```
```py
Model: "sequential_6"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_24 (Dense) (None, 512) 14848
_________________________________________________________________
dropout_4 (Dropout) (None, 512) 0
_________________________________________________________________
dense_25 (Dense) (None, 512) 262656
_________________________________________________________________
dropout_5 (Dropout) (None, 512) 0
_________________________________________________________________
dense_26 (Dense) (None, 512) 262656
_________________________________________________________________
dropout_6 (Dropout) (None, 512) 0
_________________________________________________________________
dense_27 (Dense) (None, 512) 262656
_________________________________________________________________
dropout_7 (Dropout) (None, 512) 0
_________________________________________________________________
dense_28 (Dense) (None, 1) 513
=================================================================
Total params: 803,329
Trainable params: 803,329
Non-trainable params: 0
_________________________________________________________________
WARNING:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0046s vs `on_train_batch_end` time: 0.0686s). Check your callbacks.
Epoch: 0, accuracy:0.5034, binary_crossentropy:0.8003, loss:0.9588, val_accuracy:0.5040, val_binary_crossentropy:0.6752, val_loss:0.8330,
....................................................................................................
Epoch: 100, accuracy:0.6514, binary_crossentropy:0.6067, loss:0.6373, val_accuracy:0.6470, val_binary_crossentropy:0.5868, val_loss:0.6173,
....................................................................................................
Epoch: 200, accuracy:0.6664, binary_crossentropy:0.5900, loss:0.6158, val_accuracy:0.6510, val_binary_crossentropy:0.5795, val_loss:0.6053,
....................................................................................................
Epoch: 300, accuracy:0.6690, binary_crossentropy:0.5822, loss:0.6104, val_accuracy:0.6940, val_binary_crossentropy:0.5611, val_loss:0.5892,
....................................................................................................
Epoch: 400, accuracy:0.6773, binary_crossentropy:0.5764, loss:0.6063, val_accuracy:0.6820, val_binary_crossentropy:0.5539, val_loss:0.5839,
....................................................................................................
Epoch: 500, accuracy:0.6840, binary_crossentropy:0.5695, loss:0.6012, val_accuracy:0.6870, val_binary_crossentropy:0.5500, val_loss:0.5818,
....................................................................................................
Epoch: 600, accuracy:0.6821, binary_crossentropy:0.5692, loss:0.6023, val_accuracy:0.6850, val_binary_crossentropy:0.5456, val_loss:0.5787,
....................................................................................................
Epoch: 700, accuracy:0.6836, binary_crossentropy:0.5678, loss:0.6021, val_accuracy:0.6870, val_binary_crossentropy:0.5502, val_loss:0.5846,
....................................................................................................
Epoch: 800, accuracy:0.6908, binary_crossentropy:0.5585, loss:0.5940, val_accuracy:0.7000, val_binary_crossentropy:0.5424, val_loss:0.5780,
....................................................................................................
Epoch: 900, accuracy:0.6931, binary_crossentropy:0.5583, loss:0.5948, val_accuracy:0.6860, val_binary_crossentropy:0.5447, val_loss:0.5813,
....................................................................................................
Epoch: 1000, accuracy:0.6919, binary_crossentropy:0.5563, loss:0.5940, val_accuracy:0.7100, val_binary_crossentropy:0.5422, val_loss:0.5799,
....................................................................................................
Epoch: 1100, accuracy:0.6914, binary_crossentropy:0.5545, loss:0.5935, val_accuracy:0.6940, val_binary_crossentropy:0.5375, val_loss:0.5765,
....................................................................................................
Epoch: 1200, accuracy:0.7012, binary_crossentropy:0.5466, loss:0.5867, val_accuracy:0.6970, val_binary_crossentropy:0.5429, val_loss:0.5831,
....................................................................................................
Epoch: 1300, accuracy:0.6939, binary_crossentropy:0.5491, loss:0.5903, val_accuracy:0.6950, val_binary_crossentropy:0.5477, val_loss:0.5890,
..
```
```py
plotter.plot(regularizer_histories)
plt.ylim([0.5, 0.7])
```
```py
(0.5, 0.7)
```
![png](img/77a7189086e1a02a870dbf630c311e5d.png)
This model with the `"Combined"` regularization is obviously the best one so far.
### View in TensorBoard
These models also recorded TensorBoard logs.
To open an embedded tensorboard viewer inside a notebook, copy the following into a code-cell:
```py
%tensorboard --logdir {logdir}/regularizers
```
You can view the [results of a previous run](https://tensorboard.dev/experiment/fGInKDo8TXes1z7HQku9mw/#scalars&_smoothingWeight=0.97) of this notebook on [TensorDoard.dev](https://tensorboard.dev/).
It's also included in an `<iframe>` for convenience:
```py
display.IFrame(
src="https://tensorboard.dev/experiment/fGInKDo8TXes1z7HQku9mw/#scalars&_smoothingWeight=0.97",
width = "100%",
height="800px")
```
<devsite-iframe><iframe src="/tutorials/keras/overfit_and_underfit_f819422029cc7c7599f992ca8b2e0ee4056caa3f25d943155639b7c69c4525de.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe>
This was uploaded with:
```py
tensorboard dev upload --logdir {logdir}/regularizers
```
## Conclusions
To recap: here are the most common ways to prevent overfitting in neural networks:
* Get more training data.
* Reduce the capacity of the network.
* Add weight regularization.
* Add dropout.
Two important approaches not covered in this guide are:
* data-augmentation
* batch normalization
Remember that each method can help on its own, but often combining them can be even more effective.
```py
# MIT License
#
# Copyright (c) 2017 François Chollet
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
```

View File

@@ -1,610 +0,0 @@
# 保存和恢复模型
> 原文:[https://tensorflow.google.cn/tutorials/keras/save_and_load](https://tensorflow.google.cn/tutorials/keras/save_and_load)
**Note:** 我们的 TensorFlow 社区翻译了这些文档。因为社区翻译是尽力而为, 所以无法保证它们是最准确的,并且反映了最新的 [官方英文文档](https://tensorflow.google.cn/?hl=en)。如果您有改进此翻译的建议, 请提交 pull request 到 [tensorflow/docs](https://github.com/tensorflow/docs) GitHub 仓库。要志愿地撰写或者审核译文,请加入 [docs-zh-cn@tensorflow.org Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs-zh-cn)。
模型可以在训练期间和训练完成后进行保存。这意味着模型可以从任意中断中恢复,并避免耗费比较长的时间在训练上。保存也意味着您可以共享您的模型,而其他人可以通过您的模型来重新创建工作。在发布研究模型和技术时,大多数机器学习从业者分享:
* 用于创建模型的代码
* 模型训练的权重 (weight) 和参数 (parameters) 。
共享数据有助于其他人了解模型的工作原理,并使用新数据自行尝试。
注意小心不受信任的代码——Tensorflow 模型是代码。有关详细信息,请参阅 [安全使用 Tensorflow](https://github.com/tensorflow/tensorflow/blob/master/SECURITY.md)。
### 选项
保存 Tensorflow 的模型有许多方法——具体取决于您使用的 API。本指南使用 [tf.keras](https://tensorflow.google.cn/guide/keras) 一个高级 API 用于在 Tensorflow 中构建和训练模型。有关其他方法的实现,请参阅 TensorFlow [保存和恢复](https://tensorflow.google.cn/guide/saved_model)指南或[保存到 eager](https://tensorflow.google.cn/guide/eager#object-based_saving)。
## 配置
### 安装并导入
安装并导入 Tensorflow 和依赖项:
```py
pip install -q pyyaml h5py # 以 HDF5 格式保存模型所必须
```
```py
WARNING: You are using pip version 20.2.2; however, version 20.2.3 is available.
You should consider upgrading via the '/tmpfs/src/tf_docs_env/bin/python -m pip install --upgrade pip' command.
```
```py
import os
import tensorflow as tf
from tensorflow import keras
print(tf.version.VERSION)
```
```py
2.3.0
```
### 获取示例数据集
要演示如何保存和加载权重,您将使用 [MNIST 数据集](http://yann.lecun.com/exdb/mnist/). 要加快运行速度,请使用前 1000 个示例:
```py
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
train_labels = train_labels[:1000]
test_labels = test_labels[:1000]
train_images = train_images[:1000].reshape(-1, 28 * 28) / 255.0
test_images = test_images[:1000].reshape(-1, 28 * 28) / 255.0
```
### 定义模型
首先构建一个简单的序列sequential模型
```py
# 定义一个简单的序列模型
def create_model():
model = tf.keras.models.Sequential([
keras.layers.Dense(512, activation='relu', input_shape=(784,)),
keras.layers.Dropout(0.2),
keras.layers.Dense(10)
])
model.compile(optimizer='adam',
loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
return model
# 创建一个基本的模型实例
model = create_model()
# 显示模型的结构
model.summary()
```
```py
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 512) 401920
_________________________________________________________________
dropout (Dropout) (None, 512) 0
_________________________________________________________________
dense_1 (Dense) (None, 10) 5130
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________
```
## 在训练期间保存模型(以 checkpoints 形式保存)
您可以使用训练好的模型而无需从头开始重新训练,或在您打断的地方开始训练,以防止训练过程没有保存。 [`tf.keras.callbacks.ModelCheckpoint`](https://tensorflow.google.cn/api_docs/python/tf/keras/callbacks/ModelCheckpoint) 允许在训练的*过程中*和*结束时*回调保存的模型。
### Checkpoint 回调用法
创建一个只在训练期间保存权重的 [`tf.keras.callbacks.ModelCheckpoint`](https://tensorflow.google.cn/api_docs/python/tf/keras/callbacks/ModelCheckpoint) 回调:
```py
checkpoint_path = "training_1/cp.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
# 创建一个保存模型权重的回调
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
save_weights_only=True,
verbose=1)
# 使用新的回调训练模型
model.fit(train_images,
train_labels,
epochs=10,
validation_data=(test_images,test_labels),
callbacks=[cp_callback]) # 通过回调训练
# 这可能会生成与保存优化程序状态相关的警告。
# 这些警告(以及整个笔记本中的类似警告)
# 是防止过时使用,可以忽略。
```
```py
Epoch 1/10
29/32 [==========================>...] - ETA: 0s - loss: 1.1844 - accuracy: 0.6595
Epoch 00001: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 8ms/step - loss: 1.1300 - accuracy: 0.6770 - val_loss: 0.7189 - val_accuracy: 0.7780
Epoch 2/10
30/32 [===========================>..] - ETA: 0s - loss: 0.4232 - accuracy: 0.8792
Epoch 00002: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 5ms/step - loss: 0.4216 - accuracy: 0.8800 - val_loss: 0.5160 - val_accuracy: 0.8470
Epoch 3/10
29/32 [==========================>...] - ETA: 0s - loss: 0.2964 - accuracy: 0.9149
Epoch 00003: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 4ms/step - loss: 0.2988 - accuracy: 0.9170 - val_loss: 0.4753 - val_accuracy: 0.8560
Epoch 4/10
29/32 [==========================>...] - ETA: 0s - loss: 0.2057 - accuracy: 0.9494
Epoch 00004: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 4ms/step - loss: 0.2086 - accuracy: 0.9500 - val_loss: 0.4375 - val_accuracy: 0.8600
Epoch 5/10
29/32 [==========================>...] - ETA: 0s - loss: 0.1512 - accuracy: 0.9666
Epoch 00005: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 4ms/step - loss: 0.1488 - accuracy: 0.9680 - val_loss: 0.4275 - val_accuracy: 0.8660
Epoch 6/10
30/32 [===========================>..] - ETA: 0s - loss: 0.1130 - accuracy: 0.9823
Epoch 00006: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 4ms/step - loss: 0.1134 - accuracy: 0.9820 - val_loss: 0.4309 - val_accuracy: 0.8630
Epoch 7/10
29/32 [==========================>...] - ETA: 0s - loss: 0.0829 - accuracy: 0.9925
Epoch 00007: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 4ms/step - loss: 0.0838 - accuracy: 0.9920 - val_loss: 0.4079 - val_accuracy: 0.8680
Epoch 8/10
29/32 [==========================>...] - ETA: 0s - loss: 0.0624 - accuracy: 0.9946
Epoch 00008: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 4ms/step - loss: 0.0627 - accuracy: 0.9950 - val_loss: 0.4176 - val_accuracy: 0.8690
Epoch 9/10
29/32 [==========================>...] - ETA: 0s - loss: 0.0520 - accuracy: 0.9946
Epoch 00009: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 4ms/step - loss: 0.0508 - accuracy: 0.9950 - val_loss: 0.4600 - val_accuracy: 0.8450
Epoch 10/10
29/32 [==========================>...] - ETA: 0s - loss: 0.0462 - accuracy: 0.9968
Epoch 00010: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 4ms/step - loss: 0.0459 - accuracy: 0.9970 - val_loss: 0.4378 - val_accuracy: 0.8660
<tensorflow.python.keras.callbacks.History at 0x7fe7b286b710>
```
这将创建一个 TensorFlow checkpoint 文件集合,这些文件在每个 epoch 结束时更新:
```py
ls {checkpoint_dir}
```
```py
checkpoint cp.ckpt.data-00000-of-00001 cp.ckpt.index
```
创建一个新的未经训练的模型。仅恢复模型的权重时,必须具有与原始模型具有相同网络结构的模型。由于模型具有相同的结构,您可以共享权重,尽管它是模型的不同*实例*。 现在重建一个新的未经训练的模型并在测试集上进行评估。未经训练的模型将在机会水平chance levels上执行准确度约为 10
```py
# 创建一个基本模型实例
model = create_model()
# 评估模型
loss, acc = model.evaluate(test_images, test_labels, verbose=2)
print("Untrained model, accuracy: {:5.2f}%".format(100*acc))
```
```py
32/32 - 0s - loss: 2.3734 - accuracy: 0.0990
Untrained model, accuracy: 9.90%
```
然后从 checkpoint 加载权重并重新评估:
```py
# 加载权重
model.load_weights(checkpoint_path)
# 重新评估模型
loss,acc = model.evaluate(test_images, test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))
```
```py
32/32 - 0s - loss: 0.4378 - accuracy: 0.8660
Restored model, accuracy: 86.60%
```
### checkpoint 回调选项
回调提供了几个选项,为 checkpoint 提供唯一名称并调整 checkpoint 频率。
训练一个新模型,每五个 epochs 保存一次唯一命名的 checkpoint
```py
# 在文件名中包含 epoch (使用 `str.format`)
checkpoint_path = "training_2/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
# 创建一个回调,每 5 个 epochs 保存模型的权重
cp_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_path,
verbose=1,
save_weights_only=True,
period=5)
# 创建一个新的模型实例
model = create_model()
# 使用 `checkpoint_path` 格式保存权重
model.save_weights(checkpoint_path.format(epoch=0))
# 使用新的回调训练模型
model.fit(train_images,
train_labels,
epochs=50,
callbacks=[cp_callback],
validation_data=(test_images,test_labels),
verbose=0)
```
```py
WARNING:tensorflow:`period` argument is deprecated. Please use `save_freq` to specify the frequency in number of batches seen.
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.iter
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.beta_1
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.beta_2
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.decay
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.learning_rate
WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.
Epoch 00005: saving model to training_2/cp-0005.ckpt
Epoch 00010: saving model to training_2/cp-0010.ckpt
Epoch 00015: saving model to training_2/cp-0015.ckpt
Epoch 00020: saving model to training_2/cp-0020.ckpt
Epoch 00025: saving model to training_2/cp-0025.ckpt
Epoch 00030: saving model to training_2/cp-0030.ckpt
Epoch 00035: saving model to training_2/cp-0035.ckpt
Epoch 00040: saving model to training_2/cp-0040.ckpt
Epoch 00045: saving model to training_2/cp-0045.ckpt
Epoch 00050: saving model to training_2/cp-0050.ckpt
<tensorflow.python.keras.callbacks.History at 0x7fe8021c76a0>
```
现在查看生成的 checkpoint 并选择最新的 checkpoint
```py
ls {checkpoint_dir}
```
```py
checkpoint cp-0025.ckpt.index
cp-0000.ckpt.data-00000-of-00001 cp-0030.ckpt.data-00000-of-00001
cp-0000.ckpt.index cp-0030.ckpt.index
cp-0005.ckpt.data-00000-of-00001 cp-0035.ckpt.data-00000-of-00001
cp-0005.ckpt.index cp-0035.ckpt.index
cp-0010.ckpt.data-00000-of-00001 cp-0040.ckpt.data-00000-of-00001
cp-0010.ckpt.index cp-0040.ckpt.index
cp-0015.ckpt.data-00000-of-00001 cp-0045.ckpt.data-00000-of-00001
cp-0015.ckpt.index cp-0045.ckpt.index
cp-0020.ckpt.data-00000-of-00001 cp-0050.ckpt.data-00000-of-00001
cp-0020.ckpt.index cp-0050.ckpt.index
cp-0025.ckpt.data-00000-of-00001
```
```py
latest = tf.train.latest_checkpoint(checkpoint_dir)
latest
```
```py
'training_2/cp-0050.ckpt'
```
注意: 默认的 tensorflow 格式仅保存最近的 5 个 checkpoint 。
如果要进行测试,请重置模型并加载最新的 checkpoint
```py
# 创建一个新的模型实例
model = create_model()
# 加载以前保存的权重
model.load_weights(latest)
# 重新评估模型
loss, acc = model.evaluate(test_images, test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))
```
```py
32/32 - 0s - loss: 0.4836 - accuracy: 0.8750
Restored model, accuracy: 87.50%
```
## 这些文件是什么?
上述代码将权重存储到 [checkpoint](https://tensorflow.google.cn/guide/saved_model#save_and_restore_variables)—— 格式化文件的集合中,这些文件仅包含二进制格式的训练权重。 Checkpoints 包含:
* 一个或多个包含模型权重的分片。
* 索引文件,指示哪些权重存储在哪个分片中。
如果你只在一台机器上训练一个模型,你将有一个带有后缀的碎片: `.data-00000-of-00001`
## 手动保存权重
您将了解如何将权重加载到模型中。使用 [`Model.save_weights`](https://tensorflow.google.cn/api_docs/python/tf/keras/Model#save_weights) 方法手动保存它们同样简单。默认情况下, [`tf.keras`](https://tensorflow.google.cn/api_docs/python/tf/keras) 和 `save_weights` 特别使用 TensorFlow [checkpoints](https://tensorflow.google.cn/guide/keras/checkpoints) 格式 `.ckpt` 扩展名和 ( 保存在 [HDF5](https://js.tensorflow.org/tutorials/import-keras.html) 扩展名为 `.h5` [保存并序列化模型](https://tensorflow.google.cn/guide/keras/save_and_serialize#weights_only_saving_in_savedmodel_format) )
```py
# 保存权重
model.save_weights('./checkpoints/my_checkpoint')
# 创建模型实例
model = create_model()
# 恢复权重
model.load_weights('./checkpoints/my_checkpoint')
# 评估模型
loss,acc = model.evaluate(test_images, test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100*acc))
```
```py
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.iter
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.beta_1
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.beta_2
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.decay
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.learning_rate
WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.
32/32 - 0s - loss: 0.4836 - accuracy: 0.8750
Restored model, accuracy: 87.50%
```
## 保存整个模型
调用 [`model.save`](https://tensorflow.google.cn/api_docs/python/tf/keras/Model#save) 将保存模型的结构,权重和训练配置保存在单个文件/文件夹中。这可以让您导出模型,以便在不访问原始 Python 代码*的情况下使用它。因为优化器状态optimizer-state已经恢复您可以从中断的位置恢复训练。
整个模型可以以两种不同的文件格式(`SavedModel``HDF5`)进行保存。需要注意的是 TensorFlow 的 `SavedModel` 格式是 TF2.x. 中的默认文件格式。但是,模型仍可以以 `HDF5` 格式保存。下面介绍了以两种文件格式保存整个模型的更多详细信息。
保存完整模型会非常有用——您可以在 TensorFlow.js[Saved Model](https://tensorflow.google.cn/js/tutorials/conversion/import_saved_model), [HDF5](https://tensorflow.google.cn/js/tutorials/conversion/import_keras))加载它们,然后在 web 浏览器中训练和运行它们,或者使用 TensorFlow Lite 将它们转换为在移动设备上运行([Saved Model](https://tensorflow.google.cn/lite/convert/python_api#converting_a_savedmodel_), [HDF5](https://tensorflow.google.cn/lite/convert/python_api#converting_a_keras_model_)
*自定义对象(例如,子类化模型或层)在保存和加载时需要特别注意。请参阅下面的**保存自定义对象**部分
### SavedModel 格式
SavedModel 格式是序列化模型的另一种方法。以这种格式保存的模型,可以使用 [`tf.keras.models.load_model`](https://tensorflow.google.cn/api_docs/python/tf/keras/models/load_model) 还原,并且模型与 TensorFlow Serving 兼容。[SavedModel 指南](https://tensorflow.google.cn/guide/saved_model)详细介绍了如何提供/检查 SavedModel。以下部分说明了保存和还原模型的步骤。
```py
# 创建并训练一个新的模型实例。
model = create_model()
model.fit(train_images, train_labels, epochs=5)
# 将整个模型另存为 SavedModel。
!mkdir -p saved_model
model.save('saved_model/my_model')
```
```py
Epoch 1/5
32/32 [==============================] - 0s 2ms/step - loss: 1.1705 - accuracy: 0.6690
Epoch 2/5
32/32 [==============================] - 0s 2ms/step - loss: 0.4326 - accuracy: 0.8780
Epoch 3/5
32/32 [==============================] - 0s 2ms/step - loss: 0.2910 - accuracy: 0.9190
Epoch 4/5
32/32 [==============================] - 0s 2ms/step - loss: 0.2045 - accuracy: 0.9520
Epoch 5/5
32/32 [==============================] - 0s 2ms/step - loss: 0.1538 - accuracy: 0.9650
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/tracking/tracking.py:111: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/tracking/tracking.py:111: Layer.updates (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
INFO:tensorflow:Assets written to: saved_model/my_model/assets
```
SavedModel 格式是一个包含 protobuf 二进制文件和 Tensorflow 检查点checkpoint的目录。检查保存的模型目录
```py
# my_model 文件夹
ls saved_model
# 包含一个 assets 文件夹saved_model.pb和变量文件夹。
ls saved_model/my_model
```
```py
my_model
assets saved_model.pb variables
```
从保存的模型重新加载新的 Keras 模型:
```py
new_model = tf.keras.models.load_model('saved_model/my_model')
# 检查其架构
new_model.summary()
```
```py
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_10 (Dense) (None, 512) 401920
_________________________________________________________________
dropout_5 (Dropout) (None, 512) 0
_________________________________________________________________
dense_11 (Dense) (None, 10) 5130
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________
```
还原的模型使用与原始模型相同的参数进行编译。 尝试使用加载的模型运行评估和预测:
```py
# 评估还原的模型
loss, acc = new_model.evaluate(test_images, test_labels, verbose=2)
print('Restored model, accuracy: {:5.2f}%'.format(100*acc))
print(new_model.predict(test_images).shape)
```
```py
32/32 - 0s - loss: 0.4630 - accuracy: 0.0890
Restored model, accuracy: 8.90%
(1000, 10)
```
### HDF5 格式
Keras 使用 [HDF5](https://en.wikipedia.org/wiki/Hierarchical_Data_Format) 标准提供了一种基本的保存格式。
```py
# 创建并训练一个新的模型实例
model = create_model()
model.fit(train_images, train_labels, epochs=5)
# 将整个模型保存为 HDF5 文件。
# '.h5' 扩展名指示应将模型保存到 HDF5。
model.save('my_model.h5')
```
```py
Epoch 1/5
32/32 [==============================] - 0s 2ms/step - loss: 1.1465 - accuracy: 0.6560
Epoch 2/5
32/32 [==============================] - 0s 2ms/step - loss: 0.4152 - accuracy: 0.8850
Epoch 3/5
32/32 [==============================] - 0s 2ms/step - loss: 0.2801 - accuracy: 0.9280
Epoch 4/5
32/32 [==============================] - 0s 2ms/step - loss: 0.2108 - accuracy: 0.9480
Epoch 5/5
32/32 [==============================] - 0s 2ms/step - loss: 0.1520 - accuracy: 0.9660
```
现在,从该文件重新创建模型:
```py
# 重新创建完全相同的模型,包括其权重和优化程序
new_model = tf.keras.models.load_model('my_model.h5')
# 显示网络结构
new_model.summary()
```
```py
Model: "sequential_6"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_12 (Dense) (None, 512) 401920
_________________________________________________________________
dropout_6 (Dropout) (None, 512) 0
_________________________________________________________________
dense_13 (Dense) (None, 10) 5130
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________
```
检查其准确率accuracy
```py
loss, acc = new_model.evaluate(test_images, test_labels, verbose=2)
print('Restored model, accuracy: {:5.2f}%'.format(100*acc))
```
```py
32/32 - 0s - loss: 0.4639 - accuracy: 0.0840
Restored model, accuracy: 8.40%
```
Keras 通过检查网络结构来保存模型。这项技术可以保存一切:
* 权重值
* 模型的架构
* 模型的训练配置(您传递给编译的内容)
* 优化器及其状态(如果有的话)(这使您可以在中断的地方重新开始训练)
Keras 无法保存 `v1.x` 优化器(来自 [`tf.compat.v1.train`](https://tensorflow.google.cn/api_docs/python/tf/compat/v1/train)),因为它们与检查点不兼容。对于 v1.x 优化器,您需要在加载-失去优化器的状态后,重新编译模型。
### 保存自定义对象
如果使用的是 SavedModel 格式则可以跳过此部分。HDF5 和 SavedModel 之间的主要区别在于HDF5 使用对象配置保存模型结构,而 SavedModel 保存执行图。因此SavedModel 能够保存自定义对象,例如子类化模型和自定义层,而无需原始代码。
要将自定义对象保存到 HDF5必须执行以下操作:
1. 在对象中定义一个 `get_config` 方法,以及可选的 `from_config` 类方法。
* `get_config(self)` 返回重新创建对象所需的参数的 JSON 可序列化字典。
* `from_config(cls, config)` 使用从 get_config 返回的 config 来创建一个新对象。默认情况下,此函数将使用 config 作为初始化 kwargs`return cls(**config)`)。
2. 加载模型时,将对象传递给 `custom_objects` 参数。参数必须是将字符串类名称映射到 Python 类的字典。例如,`tf.keras.models.load_model(path, custom_objects={'CustomLayer': CustomLayer})`
有关自定义对象和 `get_config` 的示例,请参见[从头开始编写层和模型](https://tensorflow.google.cn/guide/keras/custom_layers_and_models)教程。
```py
# MIT License
#
# Copyright (c) 2017 François Chollet
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
```

View File

@@ -1,180 +0,0 @@
# Introduction to the Keras Tuner
> 原文:[https://tensorflow.google.cn/tutorials/keras/keras_tuner](https://tensorflow.google.cn/tutorials/keras/keras_tuner)
## Overview
The Keras Tuner is a library that helps you pick the optimal set of hyperparameters for your TensorFlow program. The process of selecting the right set of hyperparameters for your machine learning (ML) application is called *hyperparameter tuning* or *hypertuning*.
Hyperparameters are the variables that govern the training process and the topology of an ML model. These variables remain constant over the training process and directly impact the performance of your ML program. Hyperparameters are of two types:
1. **Model hyperparameters** which influence model selection such as the number and width of hidden layers
2. **Algorithm hyperparameters** which influence the speed and quality of the learning algorithm such as the learning rate for Stochastic Gradient Descent (SGD) and the number of nearest neighbors for a k Nearest Neighbors (KNN) classifier
In this tutorial, you will use the Keras Tuner to perform hypertuning for an image classification application.
## Setup
```py
import tensorflow as tf
from tensorflow import keras
import IPython
```
Install and import the Keras Tuner.
```py
!pip install -q -U keras-tuner
import kerastuner as kt
```
```py
WARNING: You are using pip version 20.2.2; however, version 20.2.3 is available.
You should consider upgrading via the '/tmpfs/src/tf_docs_env/bin/python -m pip install --upgrade pip' command.
```
## Download and prepare the dataset
In this tutorial, you will use the Keras Tuner to find the best hyperparameters for a machine learning model that classifies images of clothing from the [Fashion MNIST dataset](https://github.com/zalandoresearch/fashion-mnist).
Load the data.
```py
(img_train, label_train), (img_test, label_test) = keras.datasets.fashion_mnist.load_data()
```
```py
# Normalize pixel values between 0 and 1
img_train = img_train.astype('float32') / 255.0
img_test = img_test.astype('float32') / 255.0
```
## Define the model
When you build a model for hypertuning, you also define the hyperparameter search space in addition to the model architecture. The model you set up for hypertuning is called a *hypermodel*.
You can define a hypermodel through two approaches:
* By using a model builder function
* By subclassing the `HyperModel` class of the Keras Tuner API
You can also use two pre-defined `HyperModel` classes - [HyperXception](https://keras-team.github.io/keras-tuner/documentation/hypermodels/#hyperxception-class) and [HyperResNet](https://keras-team.github.io/keras-tuner/documentation/hypermodels/#hyperresnet-class) for computer vision applications.
In this tutorial, you use a model builder function to define the image classification model. The model builder function returns a compiled model and uses hyperparameters you define inline to hypertune the model.
```py
def model_builder(hp):
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=(28, 28)))
# Tune the number of units in the first Dense layer
# Choose an optimal value between 32-512
hp_units = hp.Int('units', min_value = 32, max_value = 512, step = 32)
model.add(keras.layers.Dense(units = hp_units, activation = 'relu'))
model.add(keras.layers.Dense(10))
# Tune the learning rate for the optimizer
# Choose an optimal value from 0.01, 0.001, or 0.0001
hp_learning_rate = hp.Choice('learning_rate', values = [1e-2, 1e-3, 1e-4])
model.compile(optimizer = keras.optimizers.Adam(learning_rate = hp_learning_rate),
loss = keras.losses.SparseCategoricalCrossentropy(from_logits = True),
metrics = ['accuracy'])
return model
```
## Instantiate the tuner and perform hypertuning
Instantiate the tuner to perform the hypertuning. The Keras Tuner has four tuners available - `RandomSearch`, `Hyperband`, `BayesianOptimization`, and `Sklearn`. In this tutorial, you use the [Hyperband](https://arxiv.org/pdf/1603.06560.pdf) tuner.
To instantiate the Hyperband tuner, you must specify the hypermodel, the `objective` to optimize and the maximum number of epochs to train (`max_epochs`).
```py
tuner = kt.Hyperband(model_builder,
objective = 'val_accuracy',
max_epochs = 10,
factor = 3,
directory = 'my_dir',
project_name = 'intro_to_kt')
```
The Hyperband tuning algorithm uses adaptive resource allocation and early-stopping to quickly converge on a high-performing model. This is done using a sports championship style bracket. The algorithm trains a large number of models for a few epochs and carries forward only the top-performing half of models to the next round. Hyperband determines the number of models to train in a bracket by computing 1 + log<sub>`factor`</sub>(`max_epochs`) and rounding it up to the nearest integer.
Before running the hyperparameter search, define a callback to clear the training outputs at the end of every training step.
```py
class ClearTrainingOutput(tf.keras.callbacks.Callback):
def on_train_end(*args, **kwargs):
IPython.display.clear_output(wait = True)
```
Run the hyperparameter search. The arguments for the search method are the same as those used for `tf.keras.model.fit` in addition to the callback above.
```py
tuner.search(img_train, label_train, epochs = 10, validation_data = (img_test, label_test), callbacks = [ClearTrainingOutput()])
# Get the optimal hyperparameters
best_hps = tuner.get_best_hyperparameters(num_trials = 1)[0]
print(f"""
The hyperparameter search is complete. The optimal number of units in the first densely-connected
layer is {best_hps.get('units')} and the optimal learning rate for the optimizer
is {best_hps.get('learning_rate')}.
""")
```
<devsite-iframe><iframe src="/tutorials/keras/keras_tuner_a519a78fea1911af04b2b9f7741c7a9753570cdd628b143c45b62cc7e1dd9962.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe><devsite-iframe><iframe src="/tutorials/keras/keras_tuner_ba2b44510f9e52684b49b0a894a6561d8e3fe1f44b674eb91418c93636c5e160.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe><devsite-iframe><iframe src="/tutorials/keras/keras_tuner_c2c5323108f4a7f5343e68d7709c1634a11ab4e4f369de802dd6d70997cc3c6f.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe><devsite-iframe><iframe src="/tutorials/keras/keras_tuner_f2075d216cc02c6b0dc3c4178af7fa1943023cb5865c03f9f49d5ead7e18701b.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe><devsite-iframe><iframe src="/tutorials/keras/keras_tuner_9ba94851ba50c73e2db777d333047fc2c1874bebed4fbfea5a7380298e700a65.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe><devsite-iframe><iframe src="/tutorials/keras/keras_tuner_50ba5b64a7fe929f83ea8725a872fdc9dedf3ef0ce7b7405a67a068666fabd73.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe><devsite-iframe><iframe src="/tutorials/keras/keras_tuner_168a4b88269c92f3ed9acf9a80c2b878f4fd9ca3ead0b2cdb205c4130bd83094.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe><devsite-iframe><iframe src="/tutorials/keras/keras_tuner_47b0a34966f43feebd04e333a60e16b04a27545aa60bf5520a59d6d7548fb9d7.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe><devsite-iframe><iframe src="/tutorials/keras/keras_tuner_e3825583c6f56f151f2465e7f385806859a0162d75c21b38045cdc1c9440c8e4.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe><devsite-iframe><iframe src="/tutorials/keras/keras_tuner_2e55df2b3a5831eeb4af6388d0d5dcb4bc9072b2679abd631ce3b38072257a12.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe><devsite-iframe><iframe src="/tutorials/keras/keras_tuner_6a5cdf58938bdd56963be72d5e8fe1cac4b1945785df35af1116dc123d320e58.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe><devsite-iframe><iframe src="/tutorials/keras/keras_tuner_ff7616aec6013d63a1d3c5dc5d4d5f8713577dac721174196098b2213945656a.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe><devsite-iframe><iframe src="/tutorials/keras/keras_tuner_e0f11c6203af33c31efc4fcbd8ab0f615a85620bf28f4672f398a6c8860a3a39.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe>
```py
Epoch 3/4
911/1875 [=============>................] - ETA: 1s - loss: 0.5757 - accuracy: 0.8040
```
To finish this tutorial, retrain the model with the optimal hyperparameters from the search.
```py
# Build the model with the optimal hyperparameters and train it on the data
model = tuner.hypermodel.build(best_hps)
model.fit(img_train, label_train, epochs = 10, validation_data = (img_test, label_test))
```
```py
Epoch 1/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.4787 - accuracy: 0.8303 - val_loss: 0.4199 - val_accuracy: 0.8509
Epoch 2/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3600 - accuracy: 0.8684 - val_loss: 0.3902 - val_accuracy: 0.8570
Epoch 3/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3253 - accuracy: 0.8794 - val_loss: 0.3670 - val_accuracy: 0.8689
Epoch 4/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3038 - accuracy: 0.8874 - val_loss: 0.3714 - val_accuracy: 0.8684
Epoch 5/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2842 - accuracy: 0.8939 - val_loss: 0.3527 - val_accuracy: 0.8758
Epoch 6/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2678 - accuracy: 0.9005 - val_loss: 0.3334 - val_accuracy: 0.8785
Epoch 7/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2533 - accuracy: 0.9055 - val_loss: 0.3277 - val_accuracy: 0.8834
Epoch 8/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2445 - accuracy: 0.9089 - val_loss: 0.3487 - val_accuracy: 0.8768
Epoch 9/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2352 - accuracy: 0.9116 - val_loss: 0.3352 - val_accuracy: 0.8843
Epoch 10/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2260 - accuracy: 0.9145 - val_loss: 0.3457 - val_accuracy: 0.8814
<tensorflow.python.keras.callbacks.History at 0x7f1f802512b0>
```
The `my_dir/intro_to_kt` directory contains detailed logs and checkpoints for every trial (model configuration) run during the hyperparameter search. If you re-run the hyperparameter search, the Keras Tuner uses the existing state from these logs to resume the search. To disable this behavior, pass an additional `overwrite = True` argument while instantiating the tuner.
## Summary
In this tutorial, you learned how to use the Keras Tuner to tune hyperparameters for a model. To learn more about the Keras Tuner, check out these additional resources:
* [Keras Tuner on the TensorFlow blog](https://blog.tensorflow.org/2020/01/hyperparameter-tuning-with-keras-tuner.html)
* [Keras Tuner website](https://keras-team.github.io/keras-tuner/)
Also check out the [HParams Dashboard](https://tensorflow.google.cn/tensorboard/hyperparameter_tuning_with_hparams) in TensorBoard to interactively tune your model hyperparameters.

View File

@@ -1 +0,0 @@
# 加载和预处理数据

View File

@@ -1,872 +0,0 @@
# 用 tf.data 加载图片
> 原文:[https://tensorflow.google.cn/tutorials/load_data/images](https://tensorflow.google.cn/tutorials/load_data/images)
**Note:** 我们的 TensorFlow 社区翻译了这些文档。因为社区翻译是尽力而为, 所以无法保证它们是最准确的,并且反映了最新的 [官方英文文档](https://tensorflow.google.cn/?hl=en)。如果您有改进此翻译的建议, 请提交 pull request 到 [tensorflow/docs](https://github.com/tensorflow/docs) GitHub 仓库。要志愿地撰写或者审核译文,请加入 [docs-zh-cn@tensorflow.org Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs-zh-cn)。
本教程提供一个如何使用 [`tf.data`](https://tensorflow.google.cn/api_docs/python/tf/data) 加载图片的简单例子。
本例中使用的数据集分布在图片文件夹中,一个文件夹含有一类图片。
## 配置
```py
import tensorflow as tf
```
```py
AUTOTUNE = tf.data.experimental.AUTOTUNE
```
## 下载并检查数据集
### 检索图片
在你开始任何训练之前,你将需要一组图片来教会网络你想要训练的新类别。你已经创建了一个文件夹,存储了最初使用的拥有创作共用许可的花卉照片。
```py
import pathlib
data_root_orig = tf.keras.utils.get_file(origin='https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
fname='flower_photos', untar=True)
data_root = pathlib.Path(data_root_orig)
print(data_root)
```
```py
Downloading data from https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz
228818944/228813984 [==============================] - 2s 0us/step
/home/kbuilder/.keras/datasets/flower_photos
```
下载了 218 MB 之后,你现在应该有花卉照片副本:
```py
for item in data_root.iterdir():
print(item)
```
```py
/home/kbuilder/.keras/datasets/flower_photos/sunflowers
/home/kbuilder/.keras/datasets/flower_photos/daisy
/home/kbuilder/.keras/datasets/flower_photos/LICENSE.txt
/home/kbuilder/.keras/datasets/flower_photos/roses
/home/kbuilder/.keras/datasets/flower_photos/tulips
/home/kbuilder/.keras/datasets/flower_photos/dandelion
```
```py
import random
all_image_paths = list(data_root.glob('*/*'))
all_image_paths = [str(path) for path in all_image_paths]
random.shuffle(all_image_paths)
image_count = len(all_image_paths)
image_count
```
```py
3670
```
```py
all_image_paths[:10]
```
```py
['/home/kbuilder/.keras/datasets/flower_photos/daisy/4820415253_15bc3b6833_n.jpg',
'/home/kbuilder/.keras/datasets/flower_photos/roses/14172324538_2147808483_n.jpg',
'/home/kbuilder/.keras/datasets/flower_photos/sunflowers/15054866658_c1a6223403_m.jpg',
'/home/kbuilder/.keras/datasets/flower_photos/daisy/422094774_28acc69a8b_n.jpg',
'/home/kbuilder/.keras/datasets/flower_photos/roses/22982871191_ec61e36939_n.jpg',
'/home/kbuilder/.keras/datasets/flower_photos/tulips/8673416166_620fc18e2f_n.jpg',
'/home/kbuilder/.keras/datasets/flower_photos/tulips/16582481123_06e8e6b966_n.jpg',
'/home/kbuilder/.keras/datasets/flower_photos/daisy/5434914569_e9b982fde0_n.jpg',
'/home/kbuilder/.keras/datasets/flower_photos/sunflowers/184682652_c927a49226_m.jpg',
'/home/kbuilder/.keras/datasets/flower_photos/dandelion/3021333497_b927cd8596.jpg']
```
### 检查图片
现在让我们快速浏览几张图片,这样你知道你在处理什么:
```py
import os
attributions = (data_root/"LICENSE.txt").open(encoding='utf-8').readlines()[4:]
attributions = [line.split(' CC-BY') for line in attributions]
attributions = dict(attributions)
```
```py
import IPython.display as display
def caption_image(image_path):
image_rel = pathlib.Path(image_path).relative_to(data_root)
return "Image (CC BY 2.0) " + ' - '.join(attributions[str(image_rel)].split(' - ')[:-1])
```
```py
for n in range(3):
image_path = random.choice(all_image_paths)
display.display(display.Image(image_path))
print(caption_image(image_path))
print()
```
![jpeg](img/e954331a93f7da6b3ebeb6d2c90586f4.png)
```py
Image (CC BY 2.0) by Pavlina Jane
```
![jpeg](img/82eeef92c3c39a6fc38d679c9e4c37fa.png)
```py
Image (CC BY 2.0) by Samantha Forsberg
```
![jpeg](img/13fa130027f8343fe8d952fec8dd0555.png)
```py
Image (CC BY 2.0) by Manu
```
### 确定每张图片的标签
列出可用的标签:
```py
label_names = sorted(item.name for item in data_root.glob('*/') if item.is_dir())
label_names
```
```py
['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips']
```
为每个标签分配索引:
```py
label_to_index = dict((name, index) for index, name in enumerate(label_names))
label_to_index
```
```py
{'daisy': 0, 'dandelion': 1, 'roses': 2, 'sunflowers': 3, 'tulips': 4}
```
创建一个列表,包含每个文件的标签索引:
```py
all_image_labels = [label_to_index[pathlib.Path(path).parent.name]
for path in all_image_paths]
print("First 10 labels indices: ", all_image_labels[:10])
```
```py
First 10 labels indices: [0, 2, 3, 0, 2, 4, 4, 0, 3, 1]
```
### 加载和格式化图片
TensorFlow 包含加载和处理图片时你需要的所有工具:
```py
img_path = all_image_paths[0]
img_path
```
```py
'/home/kbuilder/.keras/datasets/flower_photos/daisy/4820415253_15bc3b6833_n.jpg'
```
以下是原始数据:
```py
img_raw = tf.io.read_file(img_path)
print(repr(img_raw)[:100]+"...")
```
```py
<tf.Tensor: shape=(), dtype=string, numpy=b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00...
```
将它解码为图像 tensor张量
```py
img_tensor = tf.image.decode_image(img_raw)
print(img_tensor.shape)
print(img_tensor.dtype)
```
```py
(224, 320, 3)
<dtype: 'uint8'>
```
根据你的模型调整其大小:
```py
img_final = tf.image.resize(img_tensor, [192, 192])
img_final = img_final/255.0
print(img_final.shape)
print(img_final.numpy().min())
print(img_final.numpy().max())
```
```py
(192, 192, 3)
0.0
1.0
```
将这些包装在一个简单的函数里,以备后用。
```py
def preprocess_image(image):
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.resize(image, [192, 192])
image /= 255.0 # normalize to [0,1] range
return image
```
```py
def load_and_preprocess_image(path):
image = tf.io.read_file(path)
return preprocess_image(image)
```
```py
import matplotlib.pyplot as plt
image_path = all_image_paths[0]
label = all_image_labels[0]
plt.imshow(load_and_preprocess_image(img_path))
plt.grid(False)
plt.xlabel(caption_image(img_path))
plt.title(label_names[label].title())
print()
```
![png](img/d99736f992ec3e1883b57ef705221367.png)
## 构建一个 [`tf.data.Dataset`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset)
### 一个图片数据集
构建 [`tf.data.Dataset`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset) 最简单的方法就是使用 `from_tensor_slices` 方法。
将字符串数组切片,得到一个字符串数据集:
```py
path_ds = tf.data.Dataset.from_tensor_slices(all_image_paths)
```
`shapes维数``types类型` 描述数据集里每个数据项的内容。在这里是一组标量二进制字符串。
```py
print(path_ds)
```
```py
<TensorSliceDataset shapes: (), types: tf.string>
```
现在创建一个新的数据集,通过在路径数据集上映射 `preprocess_image` 来动态加载和格式化图片。
```py
image_ds = path_ds.map(load_and_preprocess_image, num_parallel_calls=AUTOTUNE)
```
```py
import matplotlib.pyplot as plt
plt.figure(figsize=(8,8))
for n, image in enumerate(image_ds.take(4)):
plt.subplot(2,2,n+1)
plt.imshow(image)
plt.grid(False)
plt.xticks([])
plt.yticks([])
plt.xlabel(caption_image(all_image_paths[n]))
plt.show()
```
![png](img/87f405a26e039fc527ac7f2dd59de28d.png)
![png](img/309f23cd3db44be87a1c9d9d25619301.png)
![png](img/461f849577ccb00ee49683e824e095cf.png)
![png](img/187f414e1afde064024f6898871831da.png)
### 一个`(图片, 标签)`对数据集
使用同样的 `from_tensor_slices` 方法你可以创建一个标签数据集:
```py
label_ds = tf.data.Dataset.from_tensor_slices(tf.cast(all_image_labels, tf.int64))
```
```py
for label in label_ds.take(10):
print(label_names[label.numpy()])
```
```py
daisy
roses
sunflowers
daisy
roses
tulips
tulips
daisy
sunflowers
dandelion
```
由于这些数据集顺序相同,你可以将他们打包在一起得到一个`(图片, 标签)`对数据集:
```py
image_label_ds = tf.data.Dataset.zip((image_ds, label_ds))
```
这个新数据集的 `shapes维数``types类型` 也是维数和类型的元组,用来描述每个字段:
```py
print(image_label_ds)
```
```py
<ZipDataset shapes: ((192, 192, 3), ()), types: (tf.float32, tf.int64)>
```
注意:当你拥有形似 `all_image_labels``all_image_paths` 的数组,`tf.data.dataset.Dataset.zip` 的替代方法是将这对数组切片。
```py
ds = tf.data.Dataset.from_tensor_slices((all_image_paths, all_image_labels))
# 元组被解压缩到映射函数的位置参数中
def load_and_preprocess_from_path_label(path, label):
return load_and_preprocess_image(path), label
image_label_ds = ds.map(load_and_preprocess_from_path_label)
image_label_ds
```
```py
<MapDataset shapes: ((192, 192, 3), ()), types: (tf.float32, tf.int32)>
```
### 训练的基本方法
要使用此数据集训练模型,你将会想要数据:
* 被充分打乱。
* 被分割为 batch。
* 永远重复。
* 尽快提供 batch。
使用 [`tf.data`](https://tensorflow.google.cn/api_docs/python/tf/data) api 可以轻松添加这些功能。
```py
BATCH_SIZE = 32
# 设置一个和数据集大小一致的 shuffle buffer size随机缓冲区大小以保证数据
# 被充分打乱。
ds = image_label_ds.shuffle(buffer_size=image_count)
ds = ds.repeat()
ds = ds.batch(BATCH_SIZE)
# 当模型在训练的时候,`prefetch` 使数据集在后台取得 batch。
ds = ds.prefetch(buffer_size=AUTOTUNE)
ds
```
```py
<PrefetchDataset shapes: ((None, 192, 192, 3), (None,)), types: (tf.float32, tf.int32)>
```
这里有一些注意事项:
1. 顺序很重要。
*`.repeat` 之后 `.shuffle`,会在 epoch 之间打乱数据(当有些数据出现两次的时候,其他数据还没有出现过)。
*`.batch` 之后 `.shuffle`,会打乱 batch 的顺序,但是不会在 batch 之间打乱数据。
2. 你在完全打乱中使用和数据集大小一样的 `buffer_size缓冲区大小`。较大的缓冲区大小提供更好的随机化,但使用更多的内存,直到超过数据集大小。
3. 在从随机缓冲区中拉取任何元素前,要先填满它。所以当你的 `Dataset数据集`启动的时候一个大的 `buffer_size缓冲区大小`可能会引起延迟。
4. 在随机缓冲区完全为空之前,被打乱的数据集不会报告数据集的结尾。`Dataset数据集``.repeat` 重新启动,导致需要再次等待随机缓冲区被填满。
最后一点可以通过使用 [`tf.data.Dataset.apply`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#apply) 方法和融合过的 [`tf.data.experimental.shuffle_and_repeat`](https://tensorflow.google.cn/api_docs/python/tf/data/experimental/shuffle_and_repeat) 函数来解决:
```py
ds = image_label_ds.apply(
tf.data.experimental.shuffle_and_repeat(buffer_size=image_count))
ds = ds.batch(BATCH_SIZE)
ds = ds.prefetch(buffer_size=AUTOTUNE)
ds
```
```py
WARNING:tensorflow:From <ipython-input-1-4dc713bd4d84>:2: shuffle_and_repeat (from tensorflow.python.data.experimental.ops.shuffle_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.shuffle(buffer_size, seed)` followed by `tf.data.Dataset.repeat(count)`. Static tf.data optimizations will take care of using the fused implementation.
<PrefetchDataset shapes: ((None, 192, 192, 3), (None,)), types: (tf.float32, tf.int32)>
```
### 传递数据集至模型
从 [`tf.keras.applications`](https://tensorflow.google.cn/api_docs/python/tf/keras/applications) 取得 MobileNet v2 副本。
该模型副本会被用于一个简单的迁移学习例子。
设置 MobileNet 的权重为不可训练:
```py
mobile_net = tf.keras.applications.MobileNetV2(input_shape=(192, 192, 3), include_top=False)
mobile_net.trainable=False
```
```py
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_192_no_top.h5
9412608/9406464 [==============================] - 0s 0us/step
```
该模型期望它的输出被标准化至 `[-1,1]` 范围内:
```py
help(keras_applications.mobilenet_v2.preprocess_input)
```
```py
该函数使用Inception预处理
RGB 值从 [0, 255] 转化为 [-1, 1]
```
在你将输出传递给 MobilNet 模型之前,你需要将其范围从 `[0,1]` 转化为 `[-1,1]`
```py
def change_range(image,label):
return 2*image-1, label
keras_ds = ds.map(change_range)
```
MobileNet 为每张图片的特征返回一个 `6x6` 的空间网格。
传递一个 batch 的图片给它,查看结果:
```py
# 数据集可能需要几秒来启动,因为要填满其随机缓冲区。
image_batch, label_batch = next(iter(keras_ds))
```
```py
feature_map_batch = mobile_net(image_batch)
print(feature_map_batch.shape)
```
```py
(32, 6, 6, 1280)
```
构建一个包装了 MobileNet 的模型并在 [`tf.keras.layers.Dense`](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/Dense) 输出层之前使用 [`tf.keras.layers.GlobalAveragePooling2D`](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/GlobalAveragePooling2D) 来平均那些空间向量:
```py
model = tf.keras.Sequential([
mobile_net,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(len(label_names), activation = 'softmax')])
```
现在它产出符合预期 shape(维数)的输出:
```py
logit_batch = model(image_batch).numpy()
print("min logit:", logit_batch.min())
print("max logit:", logit_batch.max())
print()
print("Shape:", logit_batch.shape)
```
```py
min logit: 0.0039403443
max logit: 0.82328725
Shape: (32, 5)
```
编译模型以描述训练过程:
```py
model.compile(optimizer=tf.keras.optimizers.Adam(),
loss='sparse_categorical_crossentropy',
metrics=["accuracy"])
```
此处有两个可训练的变量 —— Dense 层中的 `weights权重``bias偏差`
```py
len(model.trainable_variables)
```
```py
2
```
```py
model.summary()
```
```py
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
mobilenetv2_1.00_192 (Functi (None, 6, 6, 1280) 2257984
_________________________________________________________________
global_average_pooling2d (Gl (None, 1280) 0
_________________________________________________________________
dense (Dense) (None, 5) 6405
=================================================================
Total params: 2,264,389
Trainable params: 6,405
Non-trainable params: 2,257,984
_________________________________________________________________
```
你已经准备好来训练模型了。
注意,出于演示目的每一个 epoch 中你将只运行 3 step但一般来说在传递给 `model.fit()` 之前你会指定 step 的真实数量,如下所示:
```py
steps_per_epoch=tf.math.ceil(len(all_image_paths)/BATCH_SIZE).numpy()
steps_per_epoch
```
```py
115.0
```
```py
model.fit(ds, epochs=1, steps_per_epoch=3)
```
```py
3/3 [==============================] - 0s 31ms/step - loss: 1.8837 - accuracy: 0.2812
<tensorflow.python.keras.callbacks.History at 0x7f43ec118eb8>
```
## 性能
注意:这部分只是展示一些可能帮助提升性能的简单技巧。深入指南,请看:[输入 pipeline管道的性能](https://tensorflow.google.cn/guide/performance/datasets)。
上面使用的简单 pipeline管道在每个 epoch 中单独读取每个文件。在本地使用 CPU 训练时这个方法是可行的,但是可能不足以进行 GPU 训练并且完全不适合任何形式的分布式训练。
要研究这点,首先构建一个简单的函数来检查数据集的性能:
```py
import time
default_timeit_steps = 2*steps_per_epoch+1
def timeit(ds, steps=default_timeit_steps):
overall_start = time.time()
# 在开始计时之前
# 取得单个 batch 来填充 pipeline管道填充随机缓冲区
it = iter(ds.take(steps+1))
next(it)
start = time.time()
for i,(images,labels) in enumerate(it):
if i%10 == 0:
print('.',end='')
print()
end = time.time()
duration = end-start
print("{} batches: {} s".format(steps, duration))
print("{:0.5f} Images/s".format(BATCH_SIZE*steps/duration))
print("Total time: {}s".format(end-overall_start))
```
当前数据集的性能是:
```py
ds = image_label_ds.apply(
tf.data.experimental.shuffle_and_repeat(buffer_size=image_count))
ds = ds.batch(BATCH_SIZE).prefetch(buffer_size=AUTOTUNE)
ds
```
```py
<PrefetchDataset shapes: ((None, 192, 192, 3), (None,)), types: (tf.float32, tf.int32)>
```
```py
timeit(ds)
```
```py
........................
231.0 batches: 14.869637966156006 s
497.12037 Images/s
Total time: 21.789817333221436s
```
### 缓存
使用 [`tf.data.Dataset.cache`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#cache) 在 epoch 之间轻松缓存计算结果。这是非常高效的,特别是当内存能容纳全部数据时。
在被预处理之后(解码和调整大小),图片在此被缓存了:
```py
ds = image_label_ds.cache()
ds = ds.apply(
tf.data.experimental.shuffle_and_repeat(buffer_size=image_count))
ds = ds.batch(BATCH_SIZE).prefetch(buffer_size=AUTOTUNE)
ds
```
```py
<PrefetchDataset shapes: ((None, 192, 192, 3), (None,)), types: (tf.float32, tf.int32)>
```
```py
timeit(ds)
```
```py
........................
231.0 batches: 0.5994970798492432 s
12330.33529 Images/s
Total time: 7.475242614746094s
```
使用内存缓存的一个缺点是必须在每次运行时重建缓存,这使得每次启动数据集时有相同的启动延迟:
```py
timeit(ds)
```
```py
........................
231.0 batches: 0.6120779514312744 s
12076.89312 Images/s
Total time: 0.6253445148468018s
```
如果内存不够容纳数据,使用一个缓存文件:
```py
ds = image_label_ds.cache(filename='./cache.tf-data')
ds = ds.apply(
tf.data.experimental.shuffle_and_repeat(buffer_size=image_count))
ds = ds.batch(BATCH_SIZE).prefetch(1)
ds
```
```py
<PrefetchDataset shapes: ((None, 192, 192, 3), (None,)), types: (tf.float32, tf.int32)>
```
```py
timeit(ds)
```
```py
........................
231.0 batches: 3.0341720581054688 s
2436.24945 Images/s
Total time: 12.044088363647461s
```
这个缓存文件也有可快速重启数据集而无需重建缓存的优点。注意第二次快了多少:
```py
timeit(ds)
```
```py
........................
231.0 batches: 2.358055353164673 s
3134.78646 Images/s
Total time: 3.105525493621826s
```
### TFRecord 文件
#### 原始图片数据
TFRecord 文件是一种用来存储一串二进制 blob 的简单格式。通过将多个示例打包进同一个文件内TensorFlow 能够一次性读取多个示例,当使用一个远程存储服务,如 GCS 时,这对性能来说尤其重要。
首先,从原始图片数据中构建出一个 TFRecord 文件:
```py
image_ds = tf.data.Dataset.from_tensor_slices(all_image_paths).map(tf.io.read_file)
tfrec = tf.data.experimental.TFRecordWriter('images.tfrec')
tfrec.write(image_ds)
```
接着,构建一个从 TFRecord 文件读取的数据集,并使用你之前定义的 `preprocess_image` 函数对图像进行解码/重新格式化:
```py
image_ds = tf.data.TFRecordDataset('images.tfrec').map(preprocess_image)
```
压缩该数据集和你之前定义的标签数据集以得到期望的 `(图片,标签)` 对:
```py
ds = tf.data.Dataset.zip((image_ds, label_ds))
ds = ds.apply(
tf.data.experimental.shuffle_and_repeat(buffer_size=image_count))
ds=ds.batch(BATCH_SIZE).prefetch(AUTOTUNE)
ds
```
```py
<PrefetchDataset shapes: ((None, 192, 192, 3), (None,)), types: (tf.float32, tf.int64)>
```
```py
timeit(ds)
```
```py
........................
231.0 batches: 14.661343574523926 s
504.18299 Images/s
Total time: 21.57948637008667s
```
这比 `缓存` 版本慢,因为你还没有缓存预处理。
#### 序列化的 Tensor张量
要为 TFRecord 文件省去一些预处理过程,首先像之前一样制作一个处理过的图片数据集:
```py
paths_ds = tf.data.Dataset.from_tensor_slices(all_image_paths)
image_ds = paths_ds.map(load_and_preprocess_image)
image_ds
```
```py
<MapDataset shapes: (192, 192, 3), types: tf.float32>
```
现在你有一个 tensor张量数据集而不是一个 `.jpeg` 字符串数据集。
要将此序列化至一个 TFRecord 文件你首先将该 tensor张量数据集转化为一个字符串数据集
```py
ds = image_ds.map(tf.io.serialize_tensor)
ds
```
```py
<MapDataset shapes: (), types: tf.string>
```
```py
tfrec = tf.data.experimental.TFRecordWriter('images.tfrec')
tfrec.write(ds)
```
有了被缓存的预处理,就能从 TFrecord 文件高效地加载数据——只需记得在使用它之前反序列化:
```py
ds = tf.data.TFRecordDataset('images.tfrec')
def parse(x):
result = tf.io.parse_tensor(x, out_type=tf.float32)
result = tf.reshape(result, [192, 192, 3])
return result
ds = ds.map(parse, num_parallel_calls=AUTOTUNE)
ds
```
```py
<ParallelMapDataset shapes: (192, 192, 3), types: tf.float32>
```
现在,像之前一样添加标签和进行相同的标准操作:
```py
ds = tf.data.Dataset.zip((ds, label_ds))
ds = ds.apply(
tf.data.experimental.shuffle_and_repeat(buffer_size=image_count))
ds=ds.batch(BATCH_SIZE).prefetch(AUTOTUNE)
ds
```
```py
<PrefetchDataset shapes: ((None, 192, 192, 3), (None,)), types: (tf.float32, tf.int64)>
```
```py
timeit(ds)
```
```py
........................
231.0 batches: 1.8890972137451172 s
3912.98020 Images/s
Total time: 2.7021732330322266s
```

View File

@@ -1,291 +0,0 @@
# 使用 tf.data 加载文本数据
> 原文:[https://tensorflow.google.cn/tutorials/load_data/text](https://tensorflow.google.cn/tutorials/load_data/text)
**Note:** 我们的 TensorFlow 社区翻译了这些文档。因为社区翻译是尽力而为, 所以无法保证它们是最准确的,并且反映了最新的 [官方英文文档](https://tensorflow.google.cn/?hl=en)。如果您有改进此翻译的建议, 请提交 pull request 到 [tensorflow/docs](https://github.com/tensorflow/docs) GitHub 仓库。要志愿地撰写或者审核译文,请加入 [docs-zh-cn@tensorflow.org Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs-zh-cn)。
本教程为你提供了一个如何使用 [`tf.data.TextLineDataset`](https://tensorflow.google.cn/api_docs/python/tf/data/TextLineDataset) 来加载文本文件的示例。`TextLineDataset` 通常被用来以文本文件构建数据集(原文件中的一行为一个样本) 。这适用于大多数的基于行的文本数据(例如,诗歌或错误日志) 。下面我们将使用相同作品(荷马的伊利亚特)三个不同版本的英文翻译,然后训练一个模型来通过单行文本确定译者。
## 环境搭建
```py
import tensorflow as tf
import tensorflow_datasets as tfds
import os
```
三个版本的翻译分别来自于:
* [William Cowper](https://en.wikipedia.org/wiki/William_Cowper) — [text](https://storage.googleapis.com/download.tensorflow.org/data/illiad/cowper.txt)
* [Edward, Earl of Derby](https://en.wikipedia.org/wiki/Edward_Smith-Stanley,_14th_Earl_of_Derby) — [text](https://storage.googleapis.com/download.tensorflow.org/data/illiad/derby.txt)
* [Samuel Butler](https://en.wikipedia.org/wiki/Samuel_Butler_%28novelist%29) — [text](https://storage.googleapis.com/download.tensorflow.org/data/illiad/butler.txt)
本教程中使用的文本文件已经进行过一些典型的预处理,主要包括删除了文档页眉和页脚,行号,章节标题。请下载这些已经被局部改动过的文件。
```py
DIRECTORY_URL = 'https://storage.googleapis.com/download.tensorflow.org/data/illiad/'
FILE_NAMES = ['cowper.txt', 'derby.txt', 'butler.txt']
for name in FILE_NAMES:
text_dir = tf.keras.utils.get_file(name, origin=DIRECTORY_URL+name)
parent_dir = os.path.dirname(text_dir)
parent_dir
```
```py
Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/illiad/cowper.txt
819200/815980 [==============================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/illiad/derby.txt
811008/809730 [==============================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/illiad/butler.txt
811008/807992 [==============================] - 0s 0us/step
'/home/kbuilder/.keras/datasets'
```
## 将文本加载到数据集中
迭代整个文件,将整个文件加载到自己的数据集中。
每个样本都需要单独标记,所以请使用 [`tf.data.Dataset.map`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#map) 来为每个样本设定标签。这将迭代数据集中的每一个样本并且返回( `example, label` )对。
```py
def labeler(example, index):
return example, tf.cast(index, tf.int64)
labeled_data_sets = []
for i, file_name in enumerate(FILE_NAMES):
lines_dataset = tf.data.TextLineDataset(os.path.join(parent_dir, file_name))
labeled_dataset = lines_dataset.map(lambda ex: labeler(ex, i))
labeled_data_sets.append(labeled_dataset)
```
将这些标记的数据集合并到一个数据集中,然后对其进行随机化操作。
```py
BUFFER_SIZE = 50000
BATCH_SIZE = 64
TAKE_SIZE = 5000
```
```py
all_labeled_data = labeled_data_sets[0]
for labeled_dataset in labeled_data_sets[1:]:
all_labeled_data = all_labeled_data.concatenate(labeled_dataset)
all_labeled_data = all_labeled_data.shuffle(
BUFFER_SIZE, reshuffle_each_iteration=False)
```
你可以使用 [`tf.data.Dataset.take`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#take) 与 `print` 来查看 `(example, label)` 对的外观。`numpy` 属性显示每个 Tensor 的值。
```py
for ex in all_labeled_data.take(5):
print(ex)
```
```py
(<tf.Tensor: shape=(), dtype=string, numpy=b'To Ida; in his presence once arrived,'>, <tf.Tensor: shape=(), dtype=int64, numpy=0>)
(<tf.Tensor: shape=(), dtype=string, numpy=b"Such now appears th' o'er-ruling sov'reign will">, <tf.Tensor: shape=(), dtype=int64, numpy=1>)
(<tf.Tensor: shape=(), dtype=string, numpy=b'Them so prepared the King of men beheld'>, <tf.Tensor: shape=(), dtype=int64, numpy=0>)
(<tf.Tensor: shape=(), dtype=string, numpy=b'mourn you, but the eddies of Scamander shall bear you into the broad'>, <tf.Tensor: shape=(), dtype=int64, numpy=2>)
(<tf.Tensor: shape=(), dtype=string, numpy=b'there was no life left in him.'>, <tf.Tensor: shape=(), dtype=int64, numpy=2>)
```
## 将文本编码成数字
机器学习基于的是数字而非文本,所以字符串需要被转化成数字列表。 为了达到此目的,我们需要构建文本与整数的一一映射。
### 建立词汇表
首先,通过将文本标记为单独的单词集合来构建词汇表。在 TensorFlow 和 Python 中均有很多方法来达成这一目的。在本教程中:
1. 迭代每个样本的 `numpy` 值。
2. 使用 `tfds.features.text.Tokenizer` 来将其分割成 `token`
3. 将这些 `token` 放入一个 Python 集合中,借此来清除重复项。
4. 获取该词汇表的大小以便于以后使用。
```py
tokenizer = tfds.features.text.Tokenizer()
vocabulary_set = set()
for text_tensor, _ in all_labeled_data:
some_tokens = tokenizer.tokenize(text_tensor.numpy())
vocabulary_set.update(some_tokens)
vocab_size = len(vocabulary_set)
vocab_size
```
```py
17178
```
### 样本编码
通过传递 `vocabulary_set``tfds.features.text.TokenTextEncoder` 来构建一个编码器。编码器的 `encode` 方法传入一行文本,返回一个整数列表。
```py
encoder = tfds.features.text.TokenTextEncoder(vocabulary_set)
```
你可以尝试运行这一行代码并查看输出的样式。
```py
example_text = next(iter(all_labeled_data))[0].numpy()
print(example_text)
```
```py
b'To Ida; in his presence once arrived,'
```
```py
encoded_example = encoder.encode(example_text)
print(encoded_example)
```
```py
[15746, 11433, 8394, 9006, 379, 3463, 17072]
```
现在,在数据集上运行编码器(通过将编码器打包到 [`tf.py_function`](https://tensorflow.google.cn/api_docs/python/tf/py_function) 并且传参至数据集的 `map` 方法的方式来运行)。
```py
def encode(text_tensor, label):
encoded_text = encoder.encode(text_tensor.numpy())
return encoded_text, label
def encode_map_fn(text, label):
# py_func doesn't set the shape of the returned tensors.
encoded_text, label = tf.py_function(encode,
inp=[text, label],
Tout=(tf.int64, tf.int64))
# `tf.data.Datasets` work best if all components have a shape set
# so set the shapes manually:
encoded_text.set_shape([None])
label.set_shape([])
return encoded_text, label
all_encoded_data = all_labeled_data.map(encode_map_fn)
```
## 将数据集分割为测试集和训练集且进行分支
使用 [`tf.data.Dataset.take`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#take) 和 [`tf.data.Dataset.skip`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#skip) 来建立一个小一些的测试数据集和稍大一些的训练数据集。
在数据集被传入模型之前,数据集需要被分批。最典型的是,每个分支中的样本大小与格式需要一致。但是数据集中样本并不全是相同大小的(每行文本字数并不相同)。因此,使用 [`tf.data.Dataset.padded_batch`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#padded_batch)(而不是 `batch` )将样本填充到相同的大小。
```py
train_data = all_encoded_data.skip(TAKE_SIZE).shuffle(BUFFER_SIZE)
train_data = train_data.padded_batch(BATCH_SIZE)
test_data = all_encoded_data.take(TAKE_SIZE)
test_data = test_data.padded_batch(BATCH_SIZE)
```
现在test_data 和 train_data 不是( `example, label` )对的集合,而是批次的集合。每个批次都是一对(*多样本*, *多标签* ),表示为数组。
```py
sample_text, sample_labels = next(iter(test_data))
sample_text[0], sample_labels[0]
```
```py
(<tf.Tensor: shape=(16,), dtype=int64, numpy=
array([15746, 11433, 8394, 9006, 379, 3463, 17072, 0, 0,
0, 0, 0, 0, 0, 0, 0])>,
<tf.Tensor: shape=(), dtype=int64, numpy=0>)
```
由于我们引入了一个新的 token 来编码(填充零),因此词汇表大小增加了一个。
```py
vocab_size += 1
```
## 建立模型
```py
model = tf.keras.Sequential()
```
第一层将整数表示转换为密集矢量嵌入。更多内容请查阅 [Word Embeddings](https://tensorflow.google.cn/tutorials/sequences/word_embeddings) 教程。
```py
model.add(tf.keras.layers.Embedding(vocab_size, 64))
```
下一层是 [LSTM](http://colah.github.io/posts/2015-08-Understanding-LSTMs/) 层,它允许模型利用上下文中理解单词含义。 LSTM 上的双向包装器有助于模型理解当前数据点与其之前和之后的数据点的关系。
```py
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)))
```
最后,我们将获得一个或多个紧密连接的层,其中最后一层是输出层。输出层输出样本属于各个标签的概率,最后具有最高概率的分类标签即为最终预测结果。
```py
# 一个或多个紧密连接的层
# 编辑 `for` 行的列表去检测层的大小
for units in [64, 64]:
model.add(tf.keras.layers.Dense(units, activation='relu'))
# 输出层。第一个参数是标签个数。
model.add(tf.keras.layers.Dense(3, activation='softmax'))
```
最后,编译这个模型。对于一个 softmax 分类模型来说,通常使用 `sparse_categorical_crossentropy` 作为其损失函数。你可以尝试其他的优化器,但是 `adam` 是最常用的。
```py
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
```
## 训练模型
利用提供的数据训练出的模型有着不错的精度(大约 83% )。
```py
model.fit(train_data, epochs=3, validation_data=test_data)
```
```py
Epoch 1/3
697/697 [==============================] - 10s 14ms/step - loss: 0.5181 - accuracy: 0.7457 - val_loss: 0.3855 - val_accuracy: 0.8222
Epoch 2/3
697/697 [==============================] - 9s 13ms/step - loss: 0.2985 - accuracy: 0.8685 - val_loss: 0.3635 - val_accuracy: 0.8350
Epoch 3/3
697/697 [==============================] - 9s 13ms/step - loss: 0.2242 - accuracy: 0.9027 - val_loss: 0.3794 - val_accuracy: 0.8246
<tensorflow.python.keras.callbacks.History at 0x7f4ff462aba8>
```
```py
eval_loss, eval_acc = model.evaluate(test_data)
print('\nEval loss: {}, Eval accuracy: {}'.format(eval_loss, eval_acc))
```
```py
79/79 [==============================] - 1s 18ms/step - loss: 0.3794 - accuracy: 0.8246
Eval loss: 0.3794495761394501, Eval accuracy: 0.8245999813079834
```

View File

@@ -1,367 +0,0 @@
# 用 tf.data 加载 CSV 数据
> 原文:[https://tensorflow.google.cn/tutorials/load_data/csv](https://tensorflow.google.cn/tutorials/load_data/csv)
**Note:** 我们的 TensorFlow 社区翻译了这些文档。因为社区翻译是尽力而为, 所以无法保证它们是最准确的,并且反映了最新的 [官方英文文档](https://tensorflow.google.cn/?hl=en)。如果您有改进此翻译的建议, 请提交 pull request 到 [tensorflow/docs](https://github.com/tensorflow/docs) GitHub 仓库。要志愿地撰写或者审核译文,请加入 [docs-zh-cn@tensorflow.org Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs-zh-cn)。
这篇教程通过一个示例展示了怎样将 CSV 格式的数据加载进 [`tf.data.Dataset`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset)。
这篇教程使用的是泰坦尼克号乘客的数据。模型会根据乘客的年龄、性别、票务舱和是否独自旅行等特征来预测乘客生还的可能性。
## 设置
```py
import functools
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
```
```py
TRAIN_DATA_URL = "https://storage.googleapis.com/tf-datasets/titanic/train.csv"
TEST_DATA_URL = "https://storage.googleapis.com/tf-datasets/titanic/eval.csv"
train_file_path = tf.keras.utils.get_file("train.csv", TRAIN_DATA_URL)
test_file_path = tf.keras.utils.get_file("eval.csv", TEST_DATA_URL)
```
```py
Downloading data from https://storage.googleapis.com/tf-datasets/titanic/train.csv
32768/30874 [===============================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tf-datasets/titanic/eval.csv
16384/13049 [=====================================] - 0s 0us/step
```
```py
# 让 numpy 数据更易读。
np.set_printoptions(precision=3, suppress=True)
```
## 加载数据
开始的时候,我们通过打印 CSV 文件的前几行来了解文件的格式。
```py
head {train_file_path}
```
```py
survived,sex,age,n_siblings_spouses,parch,fare,class,deck,embark_town,alone
0,male,22.0,1,0,7.25,Third,unknown,Southampton,n
1,female,38.0,1,0,71.2833,First,C,Cherbourg,n
1,female,26.0,0,0,7.925,Third,unknown,Southampton,y
1,female,35.0,1,0,53.1,First,C,Southampton,n
0,male,28.0,0,0,8.4583,Third,unknown,Queenstown,y
0,male,2.0,3,1,21.075,Third,unknown,Southampton,n
1,female,27.0,0,2,11.1333,Third,unknown,Southampton,n
1,female,14.0,1,0,30.0708,Second,unknown,Cherbourg,n
1,female,4.0,1,1,16.7,Third,G,Southampton,n
```
正如你看到的那样CSV 文件的每列都会有一个列名。dataset 的构造函数会自动识别这些列名。如果你使用的文件的第一行不包含列名,那么需要将列名通过字符串列表传给 `make_csv_dataset` 函数的 `column_names` 参数。
```py
CSV_COLUMNS = ['survived', 'sex', 'age', 'n_siblings_spouses', 'parch', 'fare', 'class', 'deck', 'embark_town', 'alone']
dataset = tf.data.experimental.make_csv_dataset(
...,
column_names=CSV_COLUMNS,
...)
```
这个示例使用了所有的列。如果你需要忽略数据集中的某些列,创建一个包含你需要使用的列的列表,然后传给构造器的(可选)参数 `select_columns`
```py
dataset = tf.data.experimental.make_csv_dataset(
...,
select_columns = columns_to_use,
...)
```
对于包含模型需要预测的值的列是你需要显式指定的。
```py
LABEL_COLUMN = 'survived'
LABELS = [0, 1]
```
现在从文件中读取 CSV 数据并且创建 dataset。
(完整的文档,参考 [`tf.data.experimental.make_csv_dataset`](https://tensorflow.google.cn/api_docs/python/tf/data/experimental/make_csv_dataset))
```py
def get_dataset(file_path):
dataset = tf.data.experimental.make_csv_dataset(
file_path,
batch_size=12, # 为了示例更容易展示,手动设置较小的值
label_name=LABEL_COLUMN,
na_value="?",
num_epochs=1,
ignore_errors=True)
return dataset
raw_train_data = get_dataset(train_file_path)
raw_test_data = get_dataset(test_file_path)
```
dataset 中的每个条目都是一个批次,用一个元组(*多个样本**多个标签*)表示。样本中的数据组织形式是以列为主的张量(而不是以行为主的张量),每条数据中包含的元素个数就是批次大小(这个示例中是 12
阅读下面的示例有助于你的理解。
```py
examples, labels = next(iter(raw_train_data)) # 第一个批次
print("EXAMPLES: \n", examples, "\n")
print("LABELS: \n", labels)
```
```py
EXAMPLES:
OrderedDict([('sex', <tf.Tensor: shape=(12,), dtype=string, numpy=
array([b'male', b'male', b'male', b'male', b'male', b'female', b'male',
b'female', b'male', b'male', b'male', b'female'], dtype=object)>), ('age', <tf.Tensor: shape=(12,), dtype=float32, numpy=
array([35., 30., 28., 40., 17., 19., 21., 7., 58., 26., 19., 29.],
dtype=float32)>), ('n_siblings_spouses', <tf.Tensor: shape=(12,), dtype=int32, numpy=array([0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1], dtype=int32)>), ('parch', <tf.Tensor: shape=(12,), dtype=int32, numpy=array([0, 0, 0, 0, 0, 2, 0, 2, 0, 0, 0, 0], dtype=int32)>), ('fare', <tf.Tensor: shape=(12,), dtype=float32, numpy=
array([ 8.05 , 13\. , 7.225, 7.896, 8.663, 26.283, 7.925, 26.25 ,
29.7 , 8.663, 0\. , 26\. ], dtype=float32)>), ('class', <tf.Tensor: shape=(12,), dtype=string, numpy=
array([b'Third', b'Second', b'Third', b'Third', b'Third', b'First',
b'Third', b'Second', b'First', b'Third', b'Third', b'Second'],
dtype=object)>), ('deck', <tf.Tensor: shape=(12,), dtype=string, numpy=
array([b'unknown', b'unknown', b'unknown', b'unknown', b'unknown', b'D',
b'unknown', b'unknown', b'B', b'unknown', b'unknown', b'unknown'],
dtype=object)>), ('embark_town', <tf.Tensor: shape=(12,), dtype=string, numpy=
array([b'Southampton', b'Southampton', b'Cherbourg', b'Southampton',
b'Southampton', b'Southampton', b'Southampton', b'Southampton',
b'Cherbourg', b'Southampton', b'Southampton', b'Southampton'],
dtype=object)>), ('alone', <tf.Tensor: shape=(12,), dtype=string, numpy=
array([b'y', b'y', b'y', b'y', b'y', b'n', b'y', b'n', b'y', b'n', b'y',
b'n'], dtype=object)>)])
LABELS:
tf.Tensor([0 0 0 0 0 1 0 1 0 0 0 1], shape=(12,), dtype=int32)
```
## 数据预处理
### 分类数据
CSV 数据中的有些列是分类的列。也就是说,这些列只能在有限的集合中取值。
使用 [`tf.feature_column`](https://tensorflow.google.cn/api_docs/python/tf/feature_column) API 创建一个 [`tf.feature_column.indicator_column`](https://tensorflow.google.cn/api_docs/python/tf/feature_column/indicator_column) 集合,每个 [`tf.feature_column.indicator_column`](https://tensorflow.google.cn/api_docs/python/tf/feature_column/indicator_column) 对应一个分类的列。
```py
CATEGORIES = {
'sex': ['male', 'female'],
'class' : ['First', 'Second', 'Third'],
'deck' : ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'],
'embark_town' : ['Cherbourg', 'Southhampton', 'Queenstown'],
'alone' : ['y', 'n']
}
```
```py
categorical_columns = []
for feature, vocab in CATEGORIES.items():
cat_col = tf.feature_column.categorical_column_with_vocabulary_list(
key=feature, vocabulary_list=vocab)
categorical_columns.append(tf.feature_column.indicator_column(cat_col))
```
```py
# 你刚才创建的内容
categorical_columns
```
```py
[IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='sex', vocabulary_list=('male', 'female'), dtype=tf.string, default_value=-1, num_oov_buckets=0)),
IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='class', vocabulary_list=('First', 'Second', 'Third'), dtype=tf.string, default_value=-1, num_oov_buckets=0)),
IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='deck', vocabulary_list=('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'), dtype=tf.string, default_value=-1, num_oov_buckets=0)),
IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='embark_town', vocabulary_list=('Cherbourg', 'Southhampton', 'Queenstown'), dtype=tf.string, default_value=-1, num_oov_buckets=0)),
IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='alone', vocabulary_list=('y', 'n'), dtype=tf.string, default_value=-1, num_oov_buckets=0))]
```
这将是后续构建模型时处理输入数据的一部分。
### 连续数据
连续数据需要标准化。
写一个函数标准化这些值,然后将这些值改造成 2 维的张量。
```py
def process_continuous_data(mean, data):
# 标准化数据
data = tf.cast(data, tf.float32) * 1/(2*mean)
return tf.reshape(data, [-1, 1])
```
现在创建一个数值列的集合。`tf.feature_columns.numeric_column` API 会使用 `normalizer_fn` 参数。在传参的时候使用 [`functools.partial`](https://docs.python.org/3/library/functools.html#functools.partial)`functools.partial` 由使用每个列的均值进行标准化的函数构成。
```py
MEANS = {
'age' : 29.631308,
'n_siblings_spouses' : 0.545455,
'parch' : 0.379585,
'fare' : 34.385399
}
numerical_columns = []
for feature in MEANS.keys():
num_col = tf.feature_column.numeric_column(feature, normalizer_fn=functools.partial(process_continuous_data, MEANS[feature]))
numerical_columns.append(num_col)
```
```py
# 你刚才创建的内容。
numerical_columns
```
```py
[NumericColumn(key='age', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=functools.partial(<function process_continuous_data at 0x7f3f083021e0>, 29.631308)),
NumericColumn(key='n_siblings_spouses', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=functools.partial(<function process_continuous_data at 0x7f3f083021e0>, 0.545455)),
NumericColumn(key='parch', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=functools.partial(<function process_continuous_data at 0x7f3f083021e0>, 0.379585)),
NumericColumn(key='fare', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=functools.partial(<function process_continuous_data at 0x7f3f083021e0>, 34.385399))]
```
这里使用标准化的方法需要提前知道每列的均值。如果需要计算连续的数据流的标准化的值可以使用 [TensorFlow Transform](https://tensorflow.google.cn/tfx/transform/get_started)。
### 创建预处理层
将这两个特征列的集合相加,并且传给 [`tf.keras.layers.DenseFeatures`](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/DenseFeatures) 从而创建一个进行预处理的输入层。
```py
preprocessing_layer = tf.keras.layers.DenseFeatures(categorical_columns+numerical_columns)
```
## 构建模型
`preprocessing_layer` 开始构建 [`tf.keras.Sequential`](https://tensorflow.google.cn/api_docs/python/tf/keras/Sequential)。
```py
model = tf.keras.Sequential([
preprocessing_layer,
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid'),
])
model.compile(
loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
```
## 训练、评估和预测
现在可以实例化和训练模型。
```py
train_data = raw_train_data.shuffle(500)
test_data = raw_test_data
```
```py
model.fit(train_data, epochs=20)
```
```py
Epoch 1/20
WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'collections.OrderedDict'> input: OrderedDict([('sex', <tf.Tensor 'ExpandDims_8:0' shape=(None, 1) dtype=string>), ('age', <tf.Tensor 'ExpandDims:0' shape=(None, 1) dtype=float32>), ('n_siblings_spouses', <tf.Tensor 'ExpandDims_6:0' shape=(None, 1) dtype=int32>), ('parch', <tf.Tensor 'ExpandDims_7:0' shape=(None, 1) dtype=int32>), ('fare', <tf.Tensor 'ExpandDims_5:0' shape=(None, 1) dtype=float32>), ('class', <tf.Tensor 'ExpandDims_2:0' shape=(None, 1) dtype=string>), ('deck', <tf.Tensor 'ExpandDims_3:0' shape=(None, 1) dtype=string>), ('embark_town', <tf.Tensor 'ExpandDims_4:0' shape=(None, 1) dtype=string>), ('alone', <tf.Tensor 'ExpandDims_1:0' shape=(None, 1) dtype=string>)])
Consider rewriting this model with the Functional API.
WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'collections.OrderedDict'> input: OrderedDict([('sex', <tf.Tensor 'ExpandDims_8:0' shape=(None, 1) dtype=string>), ('age', <tf.Tensor 'ExpandDims:0' shape=(None, 1) dtype=float32>), ('n_siblings_spouses', <tf.Tensor 'ExpandDims_6:0' shape=(None, 1) dtype=int32>), ('parch', <tf.Tensor 'ExpandDims_7:0' shape=(None, 1) dtype=int32>), ('fare', <tf.Tensor 'ExpandDims_5:0' shape=(None, 1) dtype=float32>), ('class', <tf.Tensor 'ExpandDims_2:0' shape=(None, 1) dtype=string>), ('deck', <tf.Tensor 'ExpandDims_3:0' shape=(None, 1) dtype=string>), ('embark_town', <tf.Tensor 'ExpandDims_4:0' shape=(None, 1) dtype=string>), ('alone', <tf.Tensor 'ExpandDims_1:0' shape=(None, 1) dtype=string>)])
Consider rewriting this model with the Functional API.
53/53 [==============================] - 0s 4ms/step - loss: 0.5501 - accuracy: 0.7225
Epoch 2/20
53/53 [==============================] - 0s 3ms/step - loss: 0.4399 - accuracy: 0.8102
Epoch 3/20
53/53 [==============================] - 0s 3ms/step - loss: 0.4158 - accuracy: 0.8150
Epoch 4/20
53/53 [==============================] - 0s 3ms/step - loss: 0.4137 - accuracy: 0.8118
Epoch 5/20
53/53 [==============================] - 0s 3ms/step - loss: 0.4011 - accuracy: 0.8278
Epoch 6/20
53/53 [==============================] - 0s 3ms/step - loss: 0.3953 - accuracy: 0.8198
Epoch 7/20
53/53 [==============================] - 0s 3ms/step - loss: 0.3834 - accuracy: 0.8325
Epoch 8/20
53/53 [==============================] - 0s 3ms/step - loss: 0.3831 - accuracy: 0.8309
Epoch 9/20
53/53 [==============================] - 0s 3ms/step - loss: 0.3768 - accuracy: 0.8453
Epoch 10/20
53/53 [==============================] - 0s 3ms/step - loss: 0.3710 - accuracy: 0.8437
Epoch 11/20
53/53 [==============================] - 0s 3ms/step - loss: 0.3704 - accuracy: 0.8389
Epoch 12/20
53/53 [==============================] - 0s 3ms/step - loss: 0.3670 - accuracy: 0.8325
Epoch 13/20
53/53 [==============================] - 0s 3ms/step - loss: 0.3603 - accuracy: 0.8517
Epoch 14/20
53/53 [==============================] - 0s 3ms/step - loss: 0.3548 - accuracy: 0.8501
Epoch 15/20
53/53 [==============================] - 0s 3ms/step - loss: 0.3554 - accuracy: 0.8469
Epoch 16/20
53/53 [==============================] - 0s 3ms/step - loss: 0.3519 - accuracy: 0.8453
Epoch 17/20
53/53 [==============================] - 0s 3ms/step - loss: 0.3472 - accuracy: 0.8596
Epoch 18/20
53/53 [==============================] - 0s 3ms/step - loss: 0.3513 - accuracy: 0.8581
Epoch 19/20
53/53 [==============================] - 0s 3ms/step - loss: 0.3448 - accuracy: 0.8469
Epoch 20/20
53/53 [==============================] - 0s 3ms/step - loss: 0.3390 - accuracy: 0.8581
<tensorflow.python.keras.callbacks.History at 0x7f3f082606a0>
```
当模型训练完成的时候,你可以在测试集 `test_data` 上检查准确性。
```py
test_loss, test_accuracy = model.evaluate(test_data)
print('\n\nTest Loss {}, Test Accuracy {}'.format(test_loss, test_accuracy))
```
```py
WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'collections.OrderedDict'> input: OrderedDict([('sex', <tf.Tensor 'ExpandDims_8:0' shape=(None, 1) dtype=string>), ('age', <tf.Tensor 'ExpandDims:0' shape=(None, 1) dtype=float32>), ('n_siblings_spouses', <tf.Tensor 'ExpandDims_6:0' shape=(None, 1) dtype=int32>), ('parch', <tf.Tensor 'ExpandDims_7:0' shape=(None, 1) dtype=int32>), ('fare', <tf.Tensor 'ExpandDims_5:0' shape=(None, 1) dtype=float32>), ('class', <tf.Tensor 'ExpandDims_2:0' shape=(None, 1) dtype=string>), ('deck', <tf.Tensor 'ExpandDims_3:0' shape=(None, 1) dtype=string>), ('embark_town', <tf.Tensor 'ExpandDims_4:0' shape=(None, 1) dtype=string>), ('alone', <tf.Tensor 'ExpandDims_1:0' shape=(None, 1) dtype=string>)])
Consider rewriting this model with the Functional API.
22/22 [==============================] - 0s 3ms/step - loss: 0.4596 - accuracy: 0.7992
Test Loss 0.45956382155418396, Test Accuracy 0.7992424368858337
```
使用 [`tf.keras.Model.predict`](https://tensorflow.google.cn/api_docs/python/tf/keras/Model#predict) 推断一个批次或多个批次的标签。
```py
predictions = model.predict(test_data)
# 显示部分结果
for prediction, survived in zip(predictions[:10], list(test_data)[0][1][:10]):
print("Predicted survival: {:.2%}".format(prediction[0]),
" | Actual outcome: ",
("SURVIVED" if bool(survived) else "DIED"))
```
```py
WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'collections.OrderedDict'> input: OrderedDict([('sex', <tf.Tensor 'ExpandDims_8:0' shape=(None, 1) dtype=string>), ('age', <tf.Tensor 'ExpandDims:0' shape=(None, 1) dtype=float32>), ('n_siblings_spouses', <tf.Tensor 'ExpandDims_6:0' shape=(None, 1) dtype=int32>), ('parch', <tf.Tensor 'ExpandDims_7:0' shape=(None, 1) dtype=int32>), ('fare', <tf.Tensor 'ExpandDims_5:0' shape=(None, 1) dtype=float32>), ('class', <tf.Tensor 'ExpandDims_2:0' shape=(None, 1) dtype=string>), ('deck', <tf.Tensor 'ExpandDims_3:0' shape=(None, 1) dtype=string>), ('embark_town', <tf.Tensor 'ExpandDims_4:0' shape=(None, 1) dtype=string>), ('alone', <tf.Tensor 'ExpandDims_1:0' shape=(None, 1) dtype=string>)])
Consider rewriting this model with the Functional API.
Predicted survival: 99.81% | Actual outcome: DIED
Predicted survival: 14.77% | Actual outcome: SURVIVED
Predicted survival: 11.87% | Actual outcome: DIED
Predicted survival: 6.05% | Actual outcome: DIED
Predicted survival: 10.83% | Actual outcome: DIED
Predicted survival: 29.45% | Actual outcome: SURVIVED
Predicted survival: 92.37% | Actual outcome: SURVIVED
Predicted survival: 4.18% | Actual outcome: SURVIVED
Predicted survival: 14.32% | Actual outcome: DIED
Predicted survival: 4.36% | Actual outcome: SURVIVED
```

View File

@@ -1,104 +0,0 @@
# 使用 tf.data 加载 NumPy 数据
> 原文:[https://tensorflow.google.cn/tutorials/load_data/numpy](https://tensorflow.google.cn/tutorials/load_data/numpy)
**Note:** 我们的 TensorFlow 社区翻译了这些文档。因为社区翻译是尽力而为, 所以无法保证它们是最准确的,并且反映了最新的 [官方英文文档](https://tensorflow.google.cn/?hl=en)。如果您有改进此翻译的建议, 请提交 pull request 到 [tensorflow/docs](https://github.com/tensorflow/docs) GitHub 仓库。要志愿地撰写或者审核译文,请加入 [docs-zh-cn@tensorflow.org Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs-zh-cn)。
本教程提供了将数据从 NumPy 数组加载到 [`tf.data.Dataset`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset) 的示例 本示例从一个 `.npz` 文件中加载 MNIST 数据集。但是,本实例中 NumPy 数据的来源并不重要。
## 安装
```py
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
```
### 从 `.npz` 文件中加载
```py
DATA_URL = 'https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz'
path = tf.keras.utils.get_file('mnist.npz', DATA_URL)
with np.load(path) as data:
train_examples = data['x_train']
train_labels = data['y_train']
test_examples = data['x_test']
test_labels = data['y_test']
```
## 使用 [`tf.data.Dataset`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset) 加载 NumPy 数组
假设您有一个示例数组和相应的标签数组,请将两个数组作为元组传递给 [`tf.data.Dataset.from_tensor_slices`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#from_tensor_slices) 以创建 [`tf.data.Dataset`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset) 。
```py
train_dataset = tf.data.Dataset.from_tensor_slices((train_examples, train_labels))
test_dataset = tf.data.Dataset.from_tensor_slices((test_examples, test_labels))
```
## 使用该数据集
### 打乱和批次化数据集
```py
BATCH_SIZE = 64
SHUFFLE_BUFFER_SIZE = 100
train_dataset = train_dataset.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)
test_dataset = test_dataset.batch(BATCH_SIZE)
```
### 建立和训练模型
```py
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer=tf.keras.optimizers.RMSprop(),
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])
```
```py
model.fit(train_dataset, epochs=10)
```
```py
Epoch 1/10
938/938 [==============================] - 2s 2ms/step - loss: 3.1713 - sparse_categorical_accuracy: 0.8769
Epoch 2/10
938/938 [==============================] - 2s 2ms/step - loss: 0.5085 - sparse_categorical_accuracy: 0.9271
Epoch 3/10
938/938 [==============================] - 2s 2ms/step - loss: 0.3764 - sparse_categorical_accuracy: 0.9466
Epoch 4/10
938/938 [==============================] - 2s 2ms/step - loss: 0.3165 - sparse_categorical_accuracy: 0.9550
Epoch 5/10
938/938 [==============================] - 2s 2ms/step - loss: 0.2812 - sparse_categorical_accuracy: 0.9599
Epoch 6/10
938/938 [==============================] - 2s 2ms/step - loss: 0.2587 - sparse_categorical_accuracy: 0.9645
Epoch 7/10
938/938 [==============================] - 2s 2ms/step - loss: 0.2530 - sparse_categorical_accuracy: 0.9674
Epoch 8/10
938/938 [==============================] - 2s 2ms/step - loss: 0.2192 - sparse_categorical_accuracy: 0.9707
Epoch 9/10
938/938 [==============================] - 2s 2ms/step - loss: 0.2116 - sparse_categorical_accuracy: 0.9721
Epoch 10/10
938/938 [==============================] - 2s 2ms/step - loss: 0.2014 - sparse_categorical_accuracy: 0.9747
<tensorflow.python.keras.callbacks.History at 0x7fe4f37d1470>
```
```py
model.evaluate(test_dataset)
```
```py
157/157 [==============================] - 0s 2ms/step - loss: 0.5586 - sparse_categorical_accuracy: 0.9568
[0.5586389303207397, 0.9567999839782715]
```

View File

@@ -1,297 +0,0 @@
# 使用 tf.data 加载 pandas dataframes
> 原文:[https://tensorflow.google.cn/tutorials/load_data/pandas_dataframe](https://tensorflow.google.cn/tutorials/load_data/pandas_dataframe)
**Note:** 我们的 TensorFlow 社区翻译了这些文档。因为社区翻译是尽力而为, 所以无法保证它们是最准确的,并且反映了最新的 [官方英文文档](https://tensorflow.google.cn/?hl=en)。如果您有改进此翻译的建议, 请提交 pull request 到 [tensorflow/docs](https://github.com/tensorflow/docs) GitHub 仓库。要志愿地撰写或者审核译文,请加入 [docs-zh-cn@tensorflow.org Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs-zh-cn)。
本教程提供了如何将 pandas dataframes 加载到 [`tf.data.Dataset`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset)。
本教程使用了一个小型[数据集](https://archive.ics.uci.edu/ml/datasets/heart+Disease)由克利夫兰诊所心脏病基金会Cleveland Clinic Foundation for Heart Disease提供. 此数据集中有几百行 CSV。每行表示一个患者每列表示一个属性describe。我们将使用这些信息来预测患者是否患有心脏病这是一个二分类问题。
## 使用 pandas 读取数据
```py
!pip install -q tensorflow-gpu==2.0.0-rc1
import pandas as pd
import tensorflow as tf
```
```py
WARNING: You are using pip version 20.2.2; however, version 20.2.3 is available.
You should consider upgrading via the '/tmpfs/src/tf_docs_env/bin/python -m pip install --upgrade pip' command.
```
下载包含心脏数据集的 csv 文件。
```py
csv_file = tf.keras.utils.get_file('heart.csv', 'https://storage.googleapis.com/applied-dl/heart.csv')
```
使用 pandas 读取 csv 文件。
```py
df = pd.read_csv(csv_file)
```
```py
df.head()
```
<devsite-iframe><iframe src="/tutorials/load_data/pandas_dataframe_420ecafb3d5d72c62762d056cc160cddfd15a9fd8290044191c203a794d6d136.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe>
```py
df.dtypes
```
```py
age int64
sex int64
cp int64
trestbps int64
chol int64
fbs int64
restecg int64
thalach int64
exang int64
oldpeak float64
slope int64
ca int64
thal object
target int64
dtype: object
```
`thal`数据帧dataframe中的 `object` )转换为离散数值。
```py
df['thal'] = pd.Categorical(df['thal'])
df['thal'] = df.thal.cat.codes
```
```py
df.head()
```
<devsite-iframe><iframe src="/tutorials/load_data/pandas_dataframe_39d2bcddc17dbd9e94883df635bb9acdb6b07d463cf1ca2ea90daeb2b4275ca7.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe>
## 使用 [`tf.data.Dataset`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset) 读取数据
使用 [`tf.data.Dataset.from_tensor_slices`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#from_tensor_slices) 从 pandas dataframe 中读取数值。
使用 [`tf.data.Dataset`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset) 的其中一个优势是可以允许您写一些简单而又高效的数据管道data pipelines)。从 [loading data guide](https://tensorflow.google.cn/guide/data) 可以了解更多。
```py
target = df.pop('target')
```
```py
dataset = tf.data.Dataset.from_tensor_slices((df.values, target.values))
```
```py
for feat, targ in dataset.take(5):
print ('Features: {}, Target: {}'.format(feat, targ))
```
```py
Features: [ 63\. 1\. 1\. 145\. 233\. 1\. 2\. 150\. 0\. 2.3 3\. 0.
2\. ], Target: 0
Features: [ 67\. 1\. 4\. 160\. 286\. 0\. 2\. 108\. 1\. 1.5 2\. 3.
3\. ], Target: 1
Features: [ 67\. 1\. 4\. 120\. 229\. 0\. 2\. 129\. 1\. 2.6 2\. 2.
4\. ], Target: 0
Features: [ 37\. 1\. 3\. 130\. 250\. 0\. 0\. 187\. 0\. 3.5 3\. 0.
3\. ], Target: 0
Features: [ 41\. 0\. 2\. 130\. 204\. 0\. 2\. 172\. 0\. 1.4 1\. 0.
3\. ], Target: 0
```
由于 `pd.Series` 实现了 `__array__` 协议,因此几乎可以在任何使用 `np.array` 或 [`tf.Tensor`](https://tensorflow.google.cn/api_docs/python/tf/Tensor) 的地方透明地使用它。
```py
tf.constant(df['thal'])
```
```py
<tf.Tensor: id=21, shape=(303,), dtype=int32, numpy=
array([2, 3, 4, 3, 3, 3, 3, 3, 4, 4, 2, 3, 2, 4, 4, 3, 4, 3, 3, 3, 3, 3,
3, 4, 4, 3, 3, 3, 3, 4, 3, 4, 3, 4, 3, 3, 4, 2, 4, 3, 4, 3, 4, 4,
2, 3, 3, 4, 3, 3, 4, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 4, 4, 3, 3, 4,
4, 2, 3, 3, 4, 3, 4, 3, 3, 4, 4, 3, 3, 4, 4, 3, 3, 3, 3, 4, 4, 4,
3, 3, 4, 3, 4, 4, 3, 4, 3, 3, 3, 4, 3, 4, 4, 3, 3, 4, 4, 4, 4, 4,
3, 3, 3, 3, 4, 3, 4, 3, 4, 4, 3, 3, 2, 4, 4, 2, 3, 3, 4, 4, 3, 4,
3, 3, 4, 2, 4, 4, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4,
4, 3, 3, 3, 4, 3, 4, 3, 4, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 4, 3, 4, 3, 2,
4, 4, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 2, 2, 4, 3, 4, 2, 4, 3,
3, 4, 3, 3, 3, 3, 4, 3, 4, 3, 4, 2, 2, 4, 3, 4, 3, 2, 4, 3, 3, 2,
4, 4, 4, 4, 3, 0, 3, 3, 3, 3, 1, 4, 3, 3, 3, 4, 3, 4, 3, 3, 3, 4,
3, 3, 4, 4, 4, 4, 3, 3, 4, 3, 4, 3, 4, 4, 3, 4, 4, 3, 4, 4, 3, 3,
3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 3, 2, 4, 4, 4, 4], dtype=int32)>
```
随机读取shuffle并批量处理数据集。
```py
train_dataset = dataset.shuffle(len(df)).batch(1)
```
## 创建并训练模型
```py
def get_compiled_model():
model = tf.keras.Sequential([
tf.keras.layers.Dense(10, activation='relu'),
tf.keras.layers.Dense(10, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
return model
```
```py
model = get_compiled_model()
model.fit(train_dataset, epochs=15)
```
```py
WARNING:tensorflow:Layer sequential is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2\. The layer has dtype float32 because it's dtype defaults to floatx.
If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.
To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.
Epoch 1/15
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_core/python/ops/nn_impl.py:183: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:Entity <function Function._initialize_uninitialized_variables.<locals>.initialize_variables at 0x7f3d7029f620> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: module 'gast' has no attribute 'Num'
WARNING: Entity <function Function._initialize_uninitialized_variables.<locals>.initialize_variables at 0x7f3d7029f620> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: module 'gast' has no attribute 'Num'
303/303 [==============================] - 1s 4ms/step - loss: 3.8214 - accuracy: 0.5149
Epoch 2/15
303/303 [==============================] - 0s 1ms/step - loss: 0.9302 - accuracy: 0.6766
Epoch 3/15
303/303 [==============================] - 0s 1ms/step - loss: 0.8203 - accuracy: 0.6964
Epoch 4/15
303/303 [==============================] - 0s 1ms/step - loss: 0.7565 - accuracy: 0.7162
Epoch 5/15
303/303 [==============================] - 0s 1ms/step - loss: 0.6607 - accuracy: 0.7162
Epoch 6/15
303/303 [==============================] - 0s 1ms/step - loss: 0.6804 - accuracy: 0.6931
Epoch 7/15
303/303 [==============================] - 0s 1ms/step - loss: 0.5967 - accuracy: 0.7525
Epoch 8/15
303/303 [==============================] - 0s 1ms/step - loss: 0.6198 - accuracy: 0.7228
Epoch 9/15
303/303 [==============================] - 0s 1ms/step - loss: 0.5584 - accuracy: 0.7624
Epoch 10/15
303/303 [==============================] - 0s 1ms/step - loss: 0.5611 - accuracy: 0.7756
Epoch 11/15
303/303 [==============================] - 0s 1ms/step - loss: 0.5364 - accuracy: 0.7492
Epoch 12/15
303/303 [==============================] - 0s 1ms/step - loss: 0.5042 - accuracy: 0.7822
Epoch 13/15
303/303 [==============================] - 0s 1ms/step - loss: 0.5168 - accuracy: 0.7624
Epoch 14/15
303/303 [==============================] - 0s 1ms/step - loss: 0.4560 - accuracy: 0.8053
Epoch 15/15
303/303 [==============================] - 0s 1ms/step - loss: 0.4350 - accuracy: 0.7987
<tensorflow.python.keras.callbacks.History at 0x7f3d7f250048>
```
## 代替特征列
将字典作为输入传输给模型就像创建 [`tf.keras.layers.Input`](https://tensorflow.google.cn/api_docs/python/tf/keras/Input) 层的匹配字典一样简单,应用任何预处理并使用 [functional api](https://tensorflow.google.cn/guide/keras/functional)。 您可以使用它作为 [feature columns](https://tensorflow.google.cn/tutorials/keras/feature_columns) 的替代方法。
```py
inputs = {key: tf.keras.layers.Input(shape=(), name=key) for key in df.keys()}
x = tf.stack(list(inputs.values()), axis=-1)
x = tf.keras.layers.Dense(10, activation='relu')(x)
output = tf.keras.layers.Dense(1, activation='sigmoid')(x)
model_func = tf.keras.Model(inputs=inputs, outputs=output)
model_func.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
```
与 [`tf.data`](https://tensorflow.google.cn/api_docs/python/tf/data) 一起使用时,保存 `pd.DataFrame` 列结构的最简单方法是将 `pd.DataFrame` 转换为 `dict` ,并对该字典进行切片。
```py
dict_slices = tf.data.Dataset.from_tensor_slices((df.to_dict('list'), target.values)).batch(16)
```
```py
for dict_slice in dict_slices.take(1):
print (dict_slice)
```
```py
({'age': <tf.Tensor: id=14781, shape=(16,), dtype=int32, numpy=
array([63, 67, 67, 37, 41, 56, 62, 57, 63, 53, 57, 56, 56, 44, 52, 57],
dtype=int32)>, 'sex': <tf.Tensor: id=14789, shape=(16,), dtype=int32, numpy=array([1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1], dtype=int32)>, 'cp': <tf.Tensor: id=14784, shape=(16,), dtype=int32, numpy=array([1, 4, 4, 3, 2, 2, 4, 4, 4, 4, 4, 2, 3, 2, 3, 3], dtype=int32)>, 'trestbps': <tf.Tensor: id=14793, shape=(16,), dtype=int32, numpy=
array([145, 160, 120, 130, 130, 120, 140, 120, 130, 140, 140, 140, 130,
120, 172, 150], dtype=int32)>, 'chol': <tf.Tensor: id=14783, shape=(16,), dtype=int32, numpy=
array([233, 286, 229, 250, 204, 236, 268, 354, 254, 203, 192, 294, 256,
263, 199, 168], dtype=int32)>, 'fbs': <tf.Tensor: id=14786, shape=(16,), dtype=int32, numpy=array([1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0], dtype=int32)>, 'restecg': <tf.Tensor: id=14788, shape=(16,), dtype=int32, numpy=array([2, 2, 2, 0, 2, 0, 2, 0, 2, 2, 0, 2, 2, 0, 0, 0], dtype=int32)>, 'thalach': <tf.Tensor: id=14792, shape=(16,), dtype=int32, numpy=
array([150, 108, 129, 187, 172, 178, 160, 163, 147, 155, 148, 153, 142,
173, 162, 174], dtype=int32)>, 'exang': <tf.Tensor: id=14785, shape=(16,), dtype=int32, numpy=array([0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0], dtype=int32)>, 'oldpeak': <tf.Tensor: id=14787, shape=(16,), dtype=float32, numpy=
array([2.3, 1.5, 2.6, 3.5, 1.4, 0.8, 3.6, 0.6, 1.4, 3.1, 0.4, 1.3, 0.6,
0\. , 0.5, 1.6], dtype=float32)>, 'slope': <tf.Tensor: id=14790, shape=(16,), dtype=int32, numpy=array([3, 2, 2, 3, 1, 1, 3, 1, 2, 3, 2, 2, 2, 1, 1, 1], dtype=int32)>, 'ca': <tf.Tensor: id=14782, shape=(16,), dtype=int32, numpy=array([0, 3, 2, 0, 0, 0, 2, 0, 1, 0, 0, 0, 1, 0, 0, 0], dtype=int32)>, 'thal': <tf.Tensor: id=14791, shape=(16,), dtype=int32, numpy=array([2, 3, 4, 3, 3, 3, 3, 3, 4, 4, 2, 3, 2, 4, 4, 3], dtype=int32)>}, <tf.Tensor: id=14794, shape=(16,), dtype=int64, numpy=array([0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0])>)
```
```py
model_func.fit(dict_slices, epochs=15)
```
```py
Epoch 1/15
WARNING:tensorflow:Entity <function Function._initialize_uninitialized_variables.<locals>.initialize_variables at 0x7f3d2c33a510> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: module 'gast' has no attribute 'Num'
WARNING: Entity <function Function._initialize_uninitialized_variables.<locals>.initialize_variables at 0x7f3d2c33a510> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: module 'gast' has no attribute 'Num'
19/19 [==============================] - 1s 30ms/step - loss: 17.3744 - accuracy: 0.7261
Epoch 2/15
19/19 [==============================] - 0s 3ms/step - loss: 9.7210 - accuracy: 0.7261
Epoch 3/15
19/19 [==============================] - 0s 3ms/step - loss: 5.0425 - accuracy: 0.6106
Epoch 4/15
19/19 [==============================] - 0s 3ms/step - loss: 4.8356 - accuracy: 0.5182
Epoch 5/15
19/19 [==============================] - 0s 3ms/step - loss: 4.4312 - accuracy: 0.5743
Epoch 6/15
19/19 [==============================] - 0s 3ms/step - loss: 4.2668 - accuracy: 0.5644
Epoch 7/15
19/19 [==============================] - 0s 3ms/step - loss: 4.1296 - accuracy: 0.5776
Epoch 8/15
19/19 [==============================] - 0s 3ms/step - loss: 4.0027 - accuracy: 0.5776
Epoch 9/15
19/19 [==============================] - 0s 3ms/step - loss: 3.8945 - accuracy: 0.5776
Epoch 10/15
19/19 [==============================] - 0s 3ms/step - loss: 3.7877 - accuracy: 0.5776
Epoch 11/15
19/19 [==============================] - 0s 3ms/step - loss: 3.6851 - accuracy: 0.5776
Epoch 12/15
19/19 [==============================] - 0s 3ms/step - loss: 3.5828 - accuracy: 0.5743
Epoch 13/15
19/19 [==============================] - 0s 3ms/step - loss: 3.4813 - accuracy: 0.5776
Epoch 14/15
19/19 [==============================] - 0s 3ms/step - loss: 3.3808 - accuracy: 0.5842
Epoch 15/15
19/19 [==============================] - 0s 3ms/step - loss: 3.2814 - accuracy: 0.5842
<tensorflow.python.keras.callbacks.History at 0x7f3d2c3a0828>
```

View File

@@ -1,444 +0,0 @@
# Unicode 字符串
> 原文:[https://tensorflow.google.cn/tutorials/load_data/unicode](https://tensorflow.google.cn/tutorials/load_data/unicode)
## 简介
处理自然语言的模型通常使用不同的字符集来处理不同的语言。*Unicode* 是一种标准的编码系统,用于表示几乎所有语言的字符。每个字符使用 `0``0x10FFFF` 之间的唯一整数[码位](https://en.wikipedia.org/wiki/Code_point)进行编码。*Unicode 字符串*是由零个或更多码位组成的序列。
本教程介绍了如何在 TensorFlow 中表示 Unicode 字符串,以及如何使用标准字符串运算的 Unicode 等效项对其进行操作。它会根据字符体系检测将 Unicode 字符串划分为不同词例。
```py
import tensorflow as tf
```
## [`tf.string`](https://tensorflow.google.cn/api_docs/python/tf#string) 数据类型
您可以使用基本的 TensorFlow [`tf.string`](https://tensorflow.google.cn/api_docs/python/tf#string) `dtype` 构建字节字符串张量。Unicode 字符串默认使用 UTF-8 编码。
```
tf.constant(u"Thanks 😊")
```py
```
<tf.Tensor: shape=(), dtype=string, numpy=b'Thanks \xf0\x9f\x98\x8a'>
```py
[`tf.string`](https://tensorflow.google.cn/api_docs/python/tf#string) 张量可以容纳不同长度的字节字符串,因为字节字符串会被视为原子单元。字符串长度不包括在张量维度中。
```
tf.constant([u"You're", u"welcome!"]).shape
```py
```
TensorShape([2])
```py
使用 Python 构造字符串时v2 v3 Unicode 的处理方式有所不同 v2 Unicode 字符串用前缀u表示如上所示 v3 字符串默认使用 Unicode 编码
## 表示 Unicode
TensorFlow 中有两种表示 Unicode 字符串的标准方式
* `string` 标量 - 使用已知[字符编码](https://en.wikipedia.org/wiki/Character_encoding)对码位序列进行编码
* `int32` 向量 - 每个位置包含单个码位
例如以下三个值均表示 Unicode 字符串 `"语言处理"`
```
# Unicode string, represented as a UTF-8 encoded string scalar.
text_utf8 = tf.constant(u"语言处理")
text_utf8
```py
```
<tf.Tensor: shape=(), dtype=string, numpy=b'\xe8\xaf\xad\xe8\xa8\x80\xe5\xa4\x84\xe7\x90\x86'>
```py
```
# Unicode string, represented as a UTF-16-BE encoded string scalar.
text_utf16be = tf.constant(u"语言处理".encode("UTF-16-BE"))
text_utf16be
```py
```
<tf.Tensor: shape=(), dtype=string, numpy=b'\x8b\xed\x8a\x00Y\x04t\x06'>
```py
```
# Unicode string, represented as a vector of Unicode code points.
text_chars = tf.constant([ord(char) for char in u"语言处理"])
text_chars
```py
```
<tf.Tensor: shape=(4,), dtype=int32, numpy=array([35821, 35328, 22788, 29702], dtype=int32)>
```py
### 在不同表示之间进行转换
TensorFlow 提供了在下列不同表示之间进行转换的运算
* [`tf.strings.unicode_decode`](https://tensorflow.google.cn/api_docs/python/tf/strings/unicode_decode)将编码的字符串标量转换为码位的向量
* [`tf.strings.unicode_encode`](https://tensorflow.google.cn/api_docs/python/tf/strings/unicode_encode)将码位的向量转换为编码的字符串标量
* [`tf.strings.unicode_transcode`](https://tensorflow.google.cn/api_docs/python/tf/strings/unicode_transcode)将编码的字符串标量转换为其他编码
```
tf.strings.unicode_decode(text_utf8,
input_encoding='UTF-8')
```py
```
<tf.Tensor: shape=(4,), dtype=int32, numpy=array([35821, 35328, 22788, 29702], dtype=int32)>
```py
```
tf.strings.unicode_encode(text_chars,
output_encoding='UTF-8')
```py
```
<tf.Tensor: shape=(), dtype=string, numpy=b'\xe8\xaf\xad\xe8\xa8\x80\xe5\xa4\x84\xe7\x90\x86'>
```py
```
tf.strings.unicode_transcode(text_utf8,
input_encoding='UTF8',
output_encoding='UTF-16-BE')
```py
```
<tf.Tensor: shape=(), dtype=string, numpy=b'\x8b\xed\x8a\x00Y\x04t\x06'>
```py
### 批次维度
解码多个字符串时每个字符串中的字符数可能不相等返回结果是 [`tf.RaggedTensor`](https://tensorflow.google.cn/guide/ragged_tensor)其中最里面的维度的长度会根据每个字符串中的字符数而变化
```
# A batch of Unicode strings, each represented as a UTF8-encoded string.
batch_utf8 = [s.encode('UTF-8') for s in
[u'hÃllo', u'What is the weather tomorrow', u'Göödnight', u'😊']]
batch_chars_ragged = tf.strings.unicode_decode(batch_utf8,
input_encoding='UTF-8')
for sentence_chars in batch_chars_ragged.to_list():
print(sentence_chars)
```py
```
[104, 195, 108, 108, 111]
[87, 104, 97, 116, 32, 105, 115, 32, 116, 104, 101, 32, 119, 101, 97, 116, 104, 101, 114, 32, 116, 111, 109, 111, 114, 114, 111, 119]
[71, 246, 246, 100, 110, 105, 103, 104, 116]
[128522]
```py
您可以直接使用此 [`tf.RaggedTensor`](https://tensorflow.google.cn/api_docs/python/tf/RaggedTensor)也可以使用 [`tf.RaggedTensor.to_tensor`](https://tensorflow.google.cn/api_docs/python/tf/RaggedTensor#to_tensor) 和 [`tf.RaggedTensor.to_sparse`](https://tensorflow.google.cn/api_docs/python/tf/RaggedTensor#to_sparse) 方法将其转换为带有填充的密集 [`tf.Tensor`](https://tensorflow.google.cn/api_docs/python/tf/Tensor) 或 [`tf.SparseTensor`](https://tensorflow.google.cn/api_docs/python/tf/sparse/SparseTensor)。
```
batch_chars_padded = batch_chars_ragged.to_tensor(default_value=-1)
print(batch_chars_padded.numpy())
```py
```
[[ 104 195 108 108 111 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1]
[ 87 104 97 116 32 105 115 32 116 104
101 32 119 101 97 116 104 101 114 32
116 111 109 111 114 114 111 119]
[ 71 246 246 100 110 105 103 104 116 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1]
[128522 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1]]
```py
```
batch_chars_sparse = batch_chars_ragged.to_sparse()
```py
在对多个具有相同长度的字符串进行编码时可以将 [`tf.Tensor`](https://tensorflow.google.cn/api_docs/python/tf/Tensor) 用作输入
```
tf.strings.unicode_encode([[99, 97, 116], [100, 111, 103], [ 99, 111, 119]],
output_encoding='UTF-8')
```py
```
<tf.Tensor: shape=(3,), dtype=string, numpy=array([b'cat', b'dog', b'cow'], dtype=object)>
```py
当对多个具有不同长度的字符串进行编码时应将 [`tf.RaggedTensor`](https://tensorflow.google.cn/api_docs/python/tf/RaggedTensor) 用作输入
```
tf.strings.unicode_encode(batch_chars_ragged, output_encoding='UTF-8')
```py
```
<tf.Tensor: shape=(4,), dtype=string, numpy=
array([b'h\xc3\x83llo', b'What is the weather tomorrow',
b'G\xc3\xb6\xc3\xb6dnight', b'\xf0\x9f\x98\x8a'], dtype=object)>
```py
如果您的张量具有填充或稀疏格式的多个字符串请在调用 `unicode_encode` 之前将其转换为 [`tf.RaggedTensor`](https://tensorflow.google.cn/api_docs/python/tf/RaggedTensor)
```
tf.strings.unicode_encode(
tf.RaggedTensor.from_sparse(batch_chars_sparse),
output_encoding='UTF-8')
```py
```
<tf.Tensor: shape=(4,), dtype=string, numpy=
array([b'h\xc3\x83llo', b'What is the weather tomorrow',
b'G\xc3\xb6\xc3\xb6dnight', b'\xf0\x9f\x98\x8a'], dtype=object)>
```py
```
tf.strings.unicode_encode(
tf.RaggedTensor.from_tensor(batch_chars_padded, padding=-1),
output_encoding='UTF-8')
```py
```
<tf.Tensor: shape=(4,), dtype=string, numpy=
array([b'h\xc3\x83llo', b'What is the weather tomorrow',
b'G\xc3\xb6\xc3\xb6dnight', b'\xf0\x9f\x98\x8a'], dtype=object)>
```py
## Unicode 运算
### 字符长度
[`tf.strings.length`](https://tensorflow.google.cn/api_docs/python/tf/strings/length) 运算具有 `unit` 参数该参数表示计算长度的方式`unit` 默认为 `"BYTE"`但也可以将其设置为其他值例如 `"UTF8_CHAR"` `"UTF16_CHAR"`以确定每个已编码 `string` 中的 Unicode 码位数量
```
# Note that the final character takes up 4 bytes in UTF8.
thanks = u'Thanks 😊'.encode('UTF-8')
num_bytes = tf.strings.length(thanks).numpy()
num_chars = tf.strings.length(thanks, unit='UTF8_CHAR').numpy()
print('{} bytes; {} UTF-8 characters'.format(num_bytes, num_chars))
```py
```
11 bytes; 8 UTF-8 characters
```py
### 字符子字符串
类似地[`tf.strings.substr`](https://tensorflow.google.cn/api_docs/python/tf/strings/substr) 运算会接受 "`unit`" 参数并用它来确定 "`pos`" "`len`" 参数包含的偏移类型
```
# default: unit='BYTE'. With len=1, we return a single byte.
tf.strings.substr(thanks, pos=7, len=1).numpy()
```py
```
b'\xf0'
```py
```
# Specifying unit='UTF8_CHAR', we return a single character, which in this case
# is 4 bytes.
print(tf.strings.substr(thanks, pos=7, len=1, unit='UTF8_CHAR').numpy())
```py
```
b'\xf0\x9f\x98\x8a'
```py
### 拆分 Unicode 字符串
[`tf.strings.unicode_split`](https://tensorflow.google.cn/api_docs/python/tf/strings/unicode_split) 运算会将 Unicode 字符串拆分为单个字符的子字符串
```
tf.strings.unicode_split(thanks, 'UTF-8').numpy()
```py
```
array([b'T', b'h', b'a', b'n', b'k', b's', b' ', b'\xf0\x9f\x98\x8a'],
dtype=object)
```py
### 字符的字节偏移量
为了将 [`tf.strings.unicode_decode`](https://tensorflow.google.cn/api_docs/python/tf/strings/unicode_decode) 生成的字符张量与原始字符串对齐了解每个字符开始位置的偏移量很有用方法 [`tf.strings.unicode_decode_with_offsets`](https://tensorflow.google.cn/api_docs/python/tf/strings/unicode_decode_with_offsets) `unicode_decode` 类似不同的是它会返回包含每个字符起始偏移量的第二张量
```
codepoints, offsets = tf.strings.unicode_decode_with_offsets(u"🎈🎉🎊", 'UTF-8')
for (codepoint, offset) in zip(codepoints.numpy(), offsets.numpy()):
print("At byte offset {}: codepoint {}".format(offset, codepoint))
```py
```
At byte offset 0: codepoint 127880
At byte offset 4: codepoint 127881
At byte offset 8: codepoint 127882
```py
## Unicode 字符体系
每个 Unicode 码位都属于某个码位集合这些集合被称作[字符体系](https://en.wikipedia.org/wiki/Script_%28Unicode%29)某个字符的字符体系有助于确定该字符可能所属的语言例如已知 'Б' 属于西里尔字符体系表明包含该字符的现代文本很可能来自某个斯拉夫语种如俄语或乌克兰语
TensorFlow 提供了 [`tf.strings.unicode_script`](https://tensorflow.google.cn/api_docs/python/tf/strings/unicode_script) 运算来确定某一给定码位使用的是哪个字符体系字符体系代码是对应于[国际 Unicode 组件](http://site.icu-project.org/home) (ICU) [`UScriptCode`](http://icu-project.org/apiref/icu4c/uscript_8h.html) 值的 `int32`
```
uscript = tf.strings.unicode_script([33464, 1041]) # ['芸', 'Б']
print(uscript.numpy()) # [17, 8] == [USCRIPT_HAN, USCRIPT_CYRILLIC]
```py
```
[17 8]
```py
[`tf.strings.unicode_script`](https://tensorflow.google.cn/api_docs/python/tf/strings/unicode_script) 运算还可以应用于码位的多维 [`tf.Tensor`](https://tensorflow.google.cn/api_docs/python/tf/Tensor) [`tf.RaggedTensor`](https://tensorflow.google.cn/api_docs/python/tf/RaggedTensor)
```
print(tf.strings.unicode_script(batch_chars_ragged))
```py
```
<tf.RaggedTensor [[25, 25, 25, 25, 25], [25, 25, 25, 25, 0, 25, 25, 0, 25, 25, 25, 0, 25, 25, 25, 25, 25, 25, 25, 0, 25, 25, 25, 25, 25, 25, 25, 25], [25, 25, 25, 25, 25, 25, 25, 25, 25], [0]]>
```py
## 示例:简单分词
分词是将文本拆分为类似单词的单元的任务当使用空格字符分隔单词时这通常很容易但是某些语言如中文和日语不使用空格而某些语言如德语中存在长复合词必须进行拆分才能分析其含义在网页文本中不同语言和字符体系常常混合在一起例如NY 株価纽约证券交易所
我们可以利用字符体系的变化进行粗略分词不实现任何 ML 模型从而估算词边界这对类似上面NY 株価示例的字符串都有效这种方法对大多数使用空格的语言也都有效因为各种字符体系中的空格字符都归类为 USCRIPT_COMMON这是一种特殊的字符体系代码不同于任何实际文本
```
# dtype: string; shape: [num_sentences]
#
# The sentences to process. Edit this line to try out different inputs!
sentence_texts = [u'Hello, world.', u'世界こんにちは']
```py
首先我们将句子解码为字符码位然后查找每个字符的字符体系标识符
```
# dtype: int32; shape: [num_sentences, (num_chars_per_sentence)]
#
# sentence_char_codepoint[i, j] is the codepoint for the j'th character in
# the i'th sentence.
sentence_char_codepoint = tf.strings.unicode_decode(sentence_texts, 'UTF-8')
print(sentence_char_codepoint)
# dtype: int32; shape: [num_sentences, (num_chars_per_sentence)]
#
# sentence_char_scripts[i, j] is the unicode script of the j'th character in
# the i'th sentence.
sentence_char_script = tf.strings.unicode_script(sentence_char_codepoint)
print(sentence_char_script)
```py
```
<tf.RaggedTensor [[72, 101, 108, 108, 111, 44, 32, 119, 111, 114, 108, 100, 46], [19990, 30028, 12371, 12435, 12395, 12385, 12399]]>
<tf.RaggedTensor [[25, 25, 25, 25, 25, 0, 0, 25, 25, 25, 25, 25, 0], [17, 17, 20, 20, 20, 20, 20]]>
```py
接下来我们使用这些字符体系标识符来确定添加词边界的位置我们在每个句子的开头添加一个词边界如果某个字符与前一个字符属于不同的字符体系也为该字符添加词边界
```
# dtype: bool; shape: [num_sentences, (num_chars_per_sentence)]
#
# sentence_char_starts_word[i, j] is True if the j'th character in the i'th
# sentence is the start of a word.
sentence_char_starts_word = tf.concat(
[tf.fill([sentence_char_script.nrows(), 1], True),
tf.not_equal(sentence_char_script[:, 1:], sentence_char_script[:, :-1])],
axis=1)
# dtype: int64; shape: [num_words]
#
# word_starts[i] is the index of the character that starts the i'th word (in
# the flattened list of characters from all sentences).
word_starts = tf.squeeze(tf.where(sentence_char_starts_word.values), axis=1)
print(word_starts)
```py
```
tf.Tensor([ 0 5 7 12 13 15], shape=(6,), dtype=int64)
```py
然后我们可以使用这些起始偏移量来构建 `RaggedTensor`它包含了所有批次的单词列表
```
# dtype: int32; shape: [num_words, (num_chars_per_word)]
#
# word_char_codepoint[i, j] is the codepoint for the j'th character in the
# i'th word.
word_char_codepoint = tf.RaggedTensor.from_row_starts(
values=sentence_char_codepoint.values,
row_starts=word_starts)
print(word_char_codepoint)
```py
```
<tf.RaggedTensor [[72, 101, 108, 108, 111], [44, 32], [119, 111, 114, 108, 100], [46], [19990, 30028], [12371, 12435, 12395, 12385, 12399]]>
```py
最后我们可以将词码位 `RaggedTensor` 划分回句子中
```
# dtype: int64; shape: [num_sentences]
#
# sentence_num_words[i] is the number of words in the i'th sentence.
sentence_num_words = tf.reduce_sum(
tf.cast(sentence_char_starts_word, tf.int64),
axis=1)
# dtype: int32; shape: [num_sentences, (num_words_per_sentence), (num_chars_per_word)]
#
# sentence_word_char_codepoint[i, j, k] is the codepoint for the k'th character
# in the j'th word in the i'th sentence.
sentence_word_char_codepoint = tf.RaggedTensor.from_row_lengths(
values=word_char_codepoint,
row_lengths=sentence_num_words)
print(sentence_word_char_codepoint)
```py
```
<tf.RaggedTensor [[[72, 101, 108, 108, 111], [44, 32], [119, 111, 114, 108, 100], [46]], [[19990, 30028], [12371, 12435, 12395, 12385, 12399]]]>
```py
为了使最终结果更易于阅读我们可以将其重新编码为 UTF-8 字符串
```
tf.strings.unicode_encode(sentence_word_char_codepoint, 'UTF-8').to_list()
```py
```
[[b'Hello', b', ', b'world', b'.'],
[b'\xe4\xb8\x96\xe7\x95\x8c',
b'\xe3\x81\x93\xe3\x82\x93\xe3\x81\xab\xe3\x81\xa1\xe3\x81\xaf']]
```

View File

@@ -1,198 +0,0 @@
# TF.Text
> 原文:[https://tensorflow.google.cn/tutorials/tensorflow_text/intro](https://tensorflow.google.cn/tutorials/tensorflow_text/intro)
## Introduction
TensorFlow Text provides a collection of text related classes and ops ready to use with TensorFlow 2.0\. The library can perform the preprocessing regularly required by text-based models, and includes other features useful for sequence modeling not provided by core TensorFlow.
The benefit of using these ops in your text preprocessing is that they are done in the TensorFlow graph. You do not need to worry about tokenization in training being different than the tokenization at inference, or managing preprocessing scripts.
## Eager Execution
TensorFlow Text requires TensorFlow 2.0, and is fully compatible with eager mode and graph mode.
* * *
**Note:** On rare occassions, this import may fail looking for the TF library. Please reset the runtime and rerun the pip install -q above.
```py
!pip install -q tensorflow-text
```
```py
DEPRECATION: Python 3.4 support has been deprecated. pip 19.1 will be the last one supporting it. Please upgrade your Python as Python 3.4 won't be maintained after March 2019 (cf PEP 429).
```
```py
import tensorflow as tf
import tensorflow_text as text
```
## Unicode
Most ops expect that the strings are in UTF-8\. If you're using a different encoding, you can use the core tensorflow transcode op to transcode into UTF-8\. You can also use the same op to coerce your string to structurally valid UTF-8 if your input could be invalid.
```py
docs = tf.constant([u'Everything not saved will be lost.'.encode('UTF-16-BE'), u'Sad☹'.encode('UTF-16-BE')])
utf8_docs = tf.strings.unicode_transcode(docs, input_encoding='UTF-16-BE', output_encoding='UTF-8')
```
## Tokenization
Tokenization is the process of breaking up a string into tokens. Commonly, these tokens are words, numbers, and/or punctuation.
The main interfaces are `Tokenizer` and `TokenizerWithOffsets` which each have a single method `tokenize` and `tokenize_with_offsets` respectively. There are multiple tokenizers available now. Each of these implement `TokenizerWithOffsets` (which extends `Tokenizer`) which includes an option for getting byte offsets into the original string. This allows the caller to know the bytes in the original string the token was created from.
All of the tokenizers return RaggedTensors with the inner-most dimension of tokens mapping to the original individual strings. As a result, the resulting shape's rank is increased by one. Please review the ragged tensor guide if you are unfamiliar with them. https://www.tensorflow.org/guide/ragged_tensors
### WhitespaceTokenizer
This is a basic tokenizer that splits UTF-8 strings on ICU defined whitespace characters (eg. space, tab, new line).
```py
tokenizer = text.WhitespaceTokenizer()
tokens = tokenizer.tokenize(['everything not saved will be lost.', u'Sad☹'.encode('UTF-8')])
print(tokens.to_list())
```
```py
WARNING: Logging before flag parsing goes to stderr.
W0701 13:16:14.667488 140633166759744 deprecation.py:323] From /tmpfs/src/tf_docs_env/lib/python3.4/site-packages/tensorflow/python/util/dispatch.py:180: batch_gather (from tensorflow.python.ops.array_ops) is deprecated and will be removed after 2017-10-25.
Instructions for updating:
`tf.batch_gather` is deprecated, please use `tf.gather` with `batch_dims` instead.
W0701 13:16:14.671800 140633166759744 deprecation.py:323] From /tmpfs/src/tf_docs_env/lib/python3.4/site-packages/tensorflow/python/ops/array_ops.py:1340: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
[[b'everything', b'not', b'saved', b'will', b'be', b'lost.'], [b'Sad\xe2\x98\xb9']]
```
### UnicodeScriptTokenizer
This tokenizer splits UTF-8 strings based on Unicode script boundaries. The script codes used correspond to International Components for Unicode (ICU) UScriptCode values. See: http://icu-project.org/apiref/icu4c/uscript_8h.html
In practice, this is similar to the `WhitespaceTokenizer` with the most apparent difference being that it will split punctuation (USCRIPT_COMMON) from language texts (eg. USCRIPT_LATIN, USCRIPT_CYRILLIC, etc) while also separating language texts from each other.
```py
tokenizer = text.UnicodeScriptTokenizer()
tokens = tokenizer.tokenize(['everything not saved will be lost.', u'Sad☹'.encode('UTF-8')])
print(tokens.to_list())
```
```py
[[b'everything', b'not', b'saved', b'will', b'be', b'lost', b'.'], [b'Sad', b'\xe2\x98\xb9']]
```
### Unicode split
When tokenizing languages without whitespace to segment words, it is common to just split by character, which can be accomplished using the [unicode_split](https://tensorflow.google.cn/api_docs/python/tf/strings/unicode_split) op found in core.
```py
tokens = tf.strings.unicode_split([u"仅今年前".encode('UTF-8')], 'UTF-8')
print(tokens.to_list())
```
```py
[[b'\xe4\xbb\x85', b'\xe4\xbb\x8a', b'\xe5\xb9\xb4', b'\xe5\x89\x8d']]
```
### Offsets
When tokenizing strings, it is often desired to know where in the original string the token originated from. For this reason, each tokenizer which implements `TokenizerWithOffsets` has a *tokenize_with_offsets* method that will return the byte offsets along with the tokens. The offset_starts lists the bytes in the original string each token starts at, and the offset_limits lists the bytes where each token ends.
```py
tokenizer = text.UnicodeScriptTokenizer()
(tokens, offset_starts, offset_limits) = tokenizer.tokenize_with_offsets(['everything not saved will be lost.', u'Sad☹'.encode('UTF-8')])
print(tokens.to_list())
print(offset_starts.to_list())
print(offset_limits.to_list())
```
```py
[[b'everything', b'not', b'saved', b'will', b'be', b'lost', b'.'], [b'Sad', b'\xe2\x98\xb9']]
[[0, 11, 15, 21, 26, 29, 33], [0, 3]]
[[10, 14, 20, 25, 28, 33, 34], [3, 6]]
```
### TF.Data Example
Tokenizers work as expected with the tf.data API. A simple example is provided below.
```py
docs = tf.data.Dataset.from_tensor_slices([['Never tell me the odds.'], ["It's a trap!"]])
tokenizer = text.WhitespaceTokenizer()
tokenized_docs = docs.map(lambda x: tokenizer.tokenize(x))
iterator = iter(tokenized_docs)
print(next(iterator).to_list())
print(next(iterator).to_list())
```
```py
[[b'Never', b'tell', b'me', b'the', b'odds.']]
[[b"It's", b'a', b'trap!']]
```
## Other Text Ops
TF.Text packages other useful preprocessing ops. We will review a couple below.
### Wordshape
A common feature used in some natural language understanding models is to see if the text string has a certain property. For example, a sentence breaking model might contain features which check for word capitalization or if a punctuation character is at the end of a string.
Wordshape defines a variety of useful regular expression based helper functions for matching various relevant patterns in your input text. Here are a few examples.
```py
tokenizer = text.WhitespaceTokenizer()
tokens = tokenizer.tokenize(['Everything not saved will be lost.', u'Sad☹'.encode('UTF-8')])
# Is capitalized?
f1 = text.wordshape(tokens, text.WordShape.HAS_TITLE_CASE)
# Are all letters uppercased?
f2 = text.wordshape(tokens, text.WordShape.IS_UPPERCASE)
# Does the token contain punctuation?
f3 = text.wordshape(tokens, text.WordShape.HAS_SOME_PUNCT_OR_SYMBOL)
# Is the token a number?
f4 = text.wordshape(tokens, text.WordShape.IS_NUMERIC_VALUE)
print(f1.to_list())
print(f2.to_list())
print(f3.to_list())
print(f4.to_list())
```
```py
[[True, False, False, False, False, False], [True]]
[[False, False, False, False, False, False], [False]]
[[False, False, False, False, False, True], [True]]
[[False, False, False, False, False, False], [False]]
```
### N-grams & Sliding Window
N-grams are sequential words given a sliding window size of *n*. When combining the tokens, there are three reduction mechanisms supported. For text, you would want to use `Reduction.STRING_JOIN` which appends the strings to each other. The default separator character is a space, but this can be changed with the string_separater argument.
The other two reduction methods are most often used with numerical values, and these are `Reduction.SUM` and `Reduction.MEAN`.
```py
tokenizer = text.WhitespaceTokenizer()
tokens = tokenizer.tokenize(['Everything not saved will be lost.', u'Sad☹'.encode('UTF-8')])
# Ngrams, in this case bi-gram (n = 2)
bigrams = text.ngrams(tokens, 2, reduction_type=text.Reduction.STRING_JOIN)
print(bigrams.to_list())
```
```py
[[b'Everything not', b'not saved', b'saved will', b'will be', b'be lost.'], []]
```

View File

@@ -1,721 +0,0 @@
# TFRecord 和 tf.Example
> 原文:[https://tensorflow.google.cn/tutorials/load_data/tfrecord](https://tensorflow.google.cn/tutorials/load_data/tfrecord)
为了高效地读取数据,比较有帮助的一种做法是对数据进行序列化并将其存储在一组可线性读取的文件(每个文件 100-200MB中。这尤其适用于通过网络进行流式传输的数据。这种做法对缓冲任何数据预处理也十分有用。
TFRecord 格式是一种用于存储二进制记录序列的简单格式。
[协议缓冲区](https://developers.google.cn/protocol-buffers/)是一个跨平台、跨语言的库,用于高效地序列化结构化数据。
协议消息由 `.proto` 文件定义,这通常是了解消息类型最简单的方法。
`tf.Example` 消息(或 protobuf是一种灵活的消息类型表示 `{"string": value}` 映射。它专为 TensorFlow 而设计,并被用于 [TFX](https://tensorflow.google.cn/tfx/) 等高级 API。
本笔记本将演示如何创建、解析和使用 `tf.Example` 消息,以及如何在 `.tfrecord` 文件之间对 `tf.Example` 消息进行序列化、写入和读取。
注:这些结构虽然有用,但并不是强制的。您无需转换现有代码即可使用 TFRecord除非您正在使用 [tf.data](https://tensorflow.google.cn/guide/datasets) 且读取数据仍是训练的瓶颈。有关数据集性能的提示,请参阅[数据输入流水线性能](https://tensorflow.google.cn/guide/performance/datasets)。
## 设置
```py
import tensorflow as tf
import numpy as np
import IPython.display as display
```
## `tf.Example`
### `tf.Example` 的数据类型
从根本上讲,`tf.Example``{"string": tf.train.Feature}` 映射。
[`tf.train.Feature`](https://tensorflow.google.cn/api_docs/python/tf/train/Feature) 消息类型可以接受以下三种类型(请参阅 [`.proto` 文件](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/example/feature.proto))。大多数其他通用类型也可以强制转换成下面的其中一种:
1. [`tf.train.BytesList`](https://tensorflow.google.cn/api_docs/python/tf/train/BytesList)(可强制转换自以下类型)
* `string`
* `byte`
1. [`tf.train.FloatList`](https://tensorflow.google.cn/api_docs/python/tf/train/FloatList)(可强制转换自以下类型)
* `float` (`float32`)
* `double` (`float64`)
1. [`tf.train.Int64List`](https://tensorflow.google.cn/api_docs/python/tf/train/Int64List)(可强制转换自以下类型)
* `bool`
* `enum`
* `int32`
* `uint32`
* `int64`
* `uint64`
为了将标准 TensorFlow 类型转换为兼容 `tf.Example` 的 [`tf.train.Feature`](https://tensorflow.google.cn/api_docs/python/tf/train/Feature),可以使用下面的快捷函数。请注意,每个函数会接受标量输入值并返回包含上述三种 `list` 类型之一的 [`tf.train.Feature`](https://tensorflow.google.cn/api_docs/python/tf/train/Feature)
```py
# The following functions can be used to convert a value to a type compatible
# with tf.Example.
def _bytes_feature(value):
"""Returns a bytes_list from a string / byte."""
if isinstance(value, type(tf.constant(0))):
value = value.numpy() # BytesList won't unpack a string from an EagerTensor.
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
def _float_feature(value):
"""Returns a float_list from a float / double."""
return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))
def _int64_feature(value):
"""Returns an int64_list from a bool / enum / int / uint."""
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
```
注:为了简单起见,本示例仅使用标量输入。要处理非标量特征,最简单的方法是使用 [`tf.io.serialize_tensor`](https://tensorflow.google.cn/api_docs/python/tf/io/serialize_tensor) 将张量转换为二进制字符串。在 TensorFlow 中,字符串是标量。使用 [`tf.io.parse_tensor`](https://tensorflow.google.cn/api_docs/python/tf/io/parse_tensor) 可将二进制字符串转换回张量。
下面是有关这些函数如何工作的一些示例。请注意不同的输入类型和标准化的输出类型。如果函数的输入类型与上述可强制转换的类型均不匹配,则该函数将引发异常(例如,`_int64_feature(1.0)` 将出错,因为 `1.0` 是浮点数,应该用于 `_float_feature` 函数):
```py
print(_bytes_feature(b'test_string'))
print(_bytes_feature(u'test_bytes'.encode('utf-8')))
print(_float_feature(np.exp(1)))
print(_int64_feature(True))
print(_int64_feature(1))
```
```py
bytes_list {
value: "test_string"
}
bytes_list {
value: "test_bytes"
}
float_list {
value: 2.7182817459106445
}
int64_list {
value: 1
}
int64_list {
value: 1
}
```
可以使用 `.SerializeToString` 方法将所有协议消息序列化为二进制字符串:
```py
feature = _float_feature(np.exp(1))
feature.SerializeToString()
```
```py
b'\x12\x06\n\x04T\xf8-@'
```
### 创建 `tf.Example` 消息
假设您要根据现有数据创建 `tf.Example` 消息。在实践中,数据集可能来自任何地方,但是从单个观测值创建 `tf.Example` 消息的过程相同:
1. 在每个观测结果中,需要使用上述其中一种函数,将每个值转换为包含三种兼容类型之一的 [`tf.train.Feature`](https://tensorflow.google.cn/api_docs/python/tf/train/Feature)。
2. 创建一个从特征名称字符串到第 1 步中生成的编码特征值的映射(字典)。
3. 将第 2 步中生成的映射转换为 [`Features` 消息](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/example/feature.proto#L85)。
在此笔记本中,您将使用 NumPy 创建一个数据集。
此数据集将具有 4 个特征:
* 具有相等 `False``True` 概率的布尔特征
*`[0, 5]` 均匀随机选择的整数特征
* 通过将整数特征作为索引从字符串表生成的字符串特征
* 来自标准正态分布的浮点特征
请思考一个样本,其中包含来自上述每个分布的 10,000 个独立且分布相同的观测值:
```py
# The number of observations in the dataset.
n_observations = int(1e4)
# Boolean feature, encoded as False or True.
feature0 = np.random.choice([False, True], n_observations)
# Integer feature, random from 0 to 4.
feature1 = np.random.randint(0, 5, n_observations)
# String feature
strings = np.array([b'cat', b'dog', b'chicken', b'horse', b'goat'])
feature2 = strings[feature1]
# Float feature, from a standard normal distribution
feature3 = np.random.randn(n_observations)
```
您可以使用 `_bytes_feature``_float_feature``_int64_feature` 将下面的每个特征强制转换为兼容 `tf.Example` 的类型。然后,可以通过下面的已编码特征创建 `tf.Example` 消息:
```py
def serialize_example(feature0, feature1, feature2, feature3):
"""
Creates a tf.Example message ready to be written to a file.
"""
# Create a dictionary mapping the feature name to the tf.Example-compatible
# data type.
feature = {
'feature0': _int64_feature(feature0),
'feature1': _int64_feature(feature1),
'feature2': _bytes_feature(feature2),
'feature3': _float_feature(feature3),
}
# Create a Features message using tf.train.Example.
example_proto = tf.train.Example(features=tf.train.Features(feature=feature))
return example_proto.SerializeToString()
```
例如,假设您从数据集中获得了一个观测值 `[False, 4, bytes('goat'), 0.9876]`。您可以使用 `create_message()` 创建和打印此观测值的 `tf.Example` 消息。如上所述,每个观测值将被写为一条 `Features` 消息。请注意,`tf.Example` [消息](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/example/example.proto#L88)只是 `Features` 消息外围的包装器:
```py
# This is an example observation from the dataset.
example_observation = []
serialized_example = serialize_example(False, 4, b'goat', 0.9876)
serialized_example
```
```py
b'\nR\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04[\xd3|?\n\x14\n\x08feature2\x12\x08\n\x06\n\x04goat\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x00\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x04'
```
要解码消息,请使用 [`tf.train.Example.FromString`](https://tensorflow.google.cn/api_docs/python/tf/train/Example#FromString) 方法。
```py
example_proto = tf.train.Example.FromString(serialized_example)
example_proto
```
```py
features {
feature {
key: "feature0"
value {
int64_list {
value: 0
}
}
}
feature {
key: "feature1"
value {
int64_list {
value: 4
}
}
}
feature {
key: "feature2"
value {
bytes_list {
value: "goat"
}
}
}
feature {
key: "feature3"
value {
float_list {
value: 0.9876000285148621
}
}
}
}
```
## TFRecords 格式详细信息
TFRecord 文件包含一系列记录。该文件只能按顺序读取。
每条记录包含一个字节字符串(用于数据有效负载),外加数据长度,以及用于完整性检查的 CRC32C使用 Castagnoli 多项式的 32 位 CRC哈希值。
每条记录会存储为以下格式:
```py
uint64 length uint32 masked_crc32_of_length byte data[length] uint32 masked_crc32_of_data
```
将记录连接起来以生成文件。[此处](https://en.wikipedia.org/wiki/Cyclic_redundancy_check)对 CRC 进行了说明,且 CRC 的掩码为:
```py
masked_crc = ((crc >> 15) | (crc << 17)) + 0xa282ead8ul
```
注:不需要在 TFRecord 文件中使用 `tf.Example``tf.Example` 只是将字典序列化为字节字符串的一种方法。文本行、编码的图像数据,或序列化的张量(使用 [`tf.io.serialize_tensor`](https://tensorflow.google.cn/api_docs/python/tf/io/serialize_tensor),或在加载时使用 [`tf.io.parse_tensor`](https://tensorflow.google.cn/api_docs/python/tf/io/parse_tensor))。有关更多选项,请参阅 [`tf.io`](https://tensorflow.google.cn/api_docs/python/tf/io) 模块。
## 使用 [`tf.data`](https://tensorflow.google.cn/api_docs/python/tf/data) 的 TFRecord 文件
[`tf.data`](https://tensorflow.google.cn/api_docs/python/tf/data) 模块还提供用于在 TensorFlow 中读取和写入数据的工具。
### 写入 TFRecord 文件
要将数据放入数据集中,最简单的方式是使用 `from_tensor_slices` 方法。
若应用于数组,将返回标量数据集:
```py
tf.data.Dataset.from_tensor_slices(feature1)
```
```py
<TensorSliceDataset shapes: (), types: tf.int64>
```
若应用于数组的元组,将返回元组的数据集:
```py
features_dataset = tf.data.Dataset.from_tensor_slices((feature0, feature1, feature2, feature3))
features_dataset
```
```py
<TensorSliceDataset shapes: ((), (), (), ()), types: (tf.bool, tf.int64, tf.string, tf.float64)>
```
```py
# Use `take(1)` to only pull one example from the dataset.
for f0,f1,f2,f3 in features_dataset.take(1):
print(f0)
print(f1)
print(f2)
print(f3)
```
```py
tf.Tensor(False, shape=(), dtype=bool)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(b'dog', shape=(), dtype=string)
tf.Tensor(-0.07658295354196158, shape=(), dtype=float64)
```
使用 [`tf.data.Dataset.map`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#map) 方法可将函数应用于 `Dataset` 的每个元素。
映射函数必须在 TensorFlow 计算图模式下进行运算(它必须在 `tf.Tensors` 上运算并返回)。可以使用 [`tf.py_function`](https://tensorflow.google.cn/api_docs/python/tf/py_function) 包装非张量函数(如 `serialize_example`)以使其兼容。
使用 [`tf.py_function`](https://tensorflow.google.cn/api_docs/python/tf/py_function) 需要指定形状和类型信息,否则它将不可用:
```py
def tf_serialize_example(f0,f1,f2,f3):
tf_string = tf.py_function(
serialize_example,
(f0,f1,f2,f3), # pass these args to the above function.
tf.string) # the return type is `tf.string`.
return tf.reshape(tf_string, ()) # The result is a scalar
```
```py
tf_serialize_example(f0,f1,f2,f3)
```
```py
<tf.Tensor: shape=(), dtype=string, numpy=b'\nQ\n\x13\n\x08feature2\x12\x07\n\x05\n\x03dog\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x01\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04\x86\xd7\x9c\xbd\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x00'>
```
将此函数应用于数据集中的每个元素:
```py
serialized_features_dataset = features_dataset.map(tf_serialize_example)
serialized_features_dataset
```
```py
<MapDataset shapes: (), types: tf.string>
```
```py
def generator():
for features in features_dataset:
yield serialize_example(*features)
```
```py
serialized_features_dataset = tf.data.Dataset.from_generator(
generator, output_types=tf.string, output_shapes=())
```
```py
serialized_features_dataset
```
```py
<FlatMapDataset shapes: (), types: tf.string>
```
并将它们写入 TFRecord 文件:
```py
filename = 'test.tfrecord'
writer = tf.data.experimental.TFRecordWriter(filename)
writer.write(serialized_features_dataset)
```
### 读取 TFRecord 文件
您还可以使用 [`tf.data.TFRecordDataset`](https://tensorflow.google.cn/api_docs/python/tf/data/TFRecordDataset) 类来读取 TFRecord 文件。
有关通过 [`tf.data`](https://tensorflow.google.cn/api_docs/python/tf/data) 使用 TFRecord 文件的详细信息,请参见[此处](https://tensorflow.google.cn/guide/datasets#consuming_tfrecord_data)。
使用 `TFRecordDataset` 对于标准化输入数据和优化性能十分有用。
```py
filenames = [filename]
raw_dataset = tf.data.TFRecordDataset(filenames)
raw_dataset
```
```py
<TFRecordDatasetV2 shapes: (), types: tf.string>
```
此时,数据集包含序列化的 [`tf.train.Example`](https://tensorflow.google.cn/api_docs/python/tf/train/Example) 消息。迭代时,它会将其作为标量字符串张量返回。
使用 `.take` 方法仅显示前 10 条记录。
注:在 [`tf.data.Dataset`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset) 上进行迭代仅在启用了 Eager Execution 时有效。
```py
for raw_record in raw_dataset.take(10):
print(repr(raw_record))
```
```py
<tf.Tensor: shape=(), dtype=string, numpy=b'\nQ\n\x13\n\x08feature2\x12\x07\n\x05\n\x03dog\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x00\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04\x86\xd7\x9c\xbd\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x01'>
<tf.Tensor: shape=(), dtype=string, numpy=b'\nQ\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x00\n\x13\n\x08feature2\x12\x07\n\x05\n\x03dog\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04\xa2\x97\xeb=\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x01'>
<tf.Tensor: shape=(), dtype=string, numpy=b'\nQ\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04!a]?\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x00\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x00\n\x13\n\x08feature2\x12\x07\n\x05\n\x03cat'>
<tf.Tensor: shape=(), dtype=string, numpy=b'\nQ\n\x13\n\x08feature2\x12\x07\n\x05\n\x03dog\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x01\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04\xc4\x84`\xbf\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x01'>
<tf.Tensor: shape=(), dtype=string, numpy=b'\nR\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x00\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x04\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04\xce\xb28\xbe\n\x14\n\x08feature2\x12\x08\n\x06\n\x04goat'>
<tf.Tensor: shape=(), dtype=string, numpy=b'\nU\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x01\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04P\xc4\x94?\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x02\n\x17\n\x08feature2\x12\x0b\n\t\n\x07chicken'>
<tf.Tensor: shape=(), dtype=string, numpy=b'\nQ\n\x13\n\x08feature2\x12\x07\n\x05\n\x03cat\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04\xa5\xc8\xea>\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x00\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x00'>
<tf.Tensor: shape=(), dtype=string, numpy=b'\nU\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x01\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x02\n\x17\n\x08feature2\x12\x0b\n\t\n\x07chicken\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04\xda\x16[?'>
<tf.Tensor: shape=(), dtype=string, numpy=b'\nS\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04nmO?\n\x15\n\x08feature2\x12\t\n\x07\n\x05horse\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x01\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x03'>
<tf.Tensor: shape=(), dtype=string, numpy=b'\nU\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x00\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04\x8c \x8d\xbf\n\x17\n\x08feature2\x12\x0b\n\t\n\x07chicken\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x02'>
```
可以使用以下函数对这些张量进行解析。请注意,这里的 `feature_description` 是必需的,因为数据集使用计算图执行,并且需要以下描述来构建它们的形状和类型签名:
```py
# Create a description of the features.
feature_description = {
'feature0': tf.io.FixedLenFeature([], tf.int64, default_value=0),
'feature1': tf.io.FixedLenFeature([], tf.int64, default_value=0),
'feature2': tf.io.FixedLenFeature([], tf.string, default_value=''),
'feature3': tf.io.FixedLenFeature([], tf.float32, default_value=0.0),
}
def _parse_function(example_proto):
# Parse the input `tf.Example` proto using the dictionary above.
return tf.io.parse_single_example(example_proto, feature_description)
```
或者,使用 `tf.parse example` 一次解析整个批次。使用 [`tf.data.Dataset.map`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#map) 方法将此函数应用于数据集中的每一项:
```py
parsed_dataset = raw_dataset.map(_parse_function)
parsed_dataset
```
```py
<MapDataset shapes: {feature0: (), feature1: (), feature2: (), feature3: ()}, types: {feature0: tf.int64, feature1: tf.int64, feature2: tf.string, feature3: tf.float32}>
```
使用 Eager Execution 在数据集中显示观测值。此数据集中有 10,000 个观测值,但只会显示前 10 个。数据会作为特征字典进行显示。每一项都是一个 [`tf.Tensor`](https://tensorflow.google.cn/api_docs/python/tf/Tensor),此张量的 `numpy` 元素会显示特征的值:
```py
for parsed_record in parsed_dataset.take(10):
print(repr(parsed_record))
```
```py
{'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=0>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=1>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'dog'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=-0.07658295>}
{'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=0>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=1>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'dog'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=0.11503531>}
{'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=0>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=0>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'cat'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=0.8647633>}
{'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=1>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=1>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'dog'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=-0.87702584>}
{'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=0>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=4>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'goat'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=-0.18036959>}
{'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=1>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=2>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'chicken'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=1.162241>}
{'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=0>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=0>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'cat'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=0.45856205>}
{'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=1>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=2>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'chicken'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=0.85581744>}
{'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=1>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=3>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'horse'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=0.8102635>}
{'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=0>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=2>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'chicken'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=-1.1025558>}
```
在这里,`tf.parse_example` 函数会将 `tf.Example` 字段解压缩为标准张量。
## Python 中的 TFRecord 文件
[`tf.io`](https://tensorflow.google.cn/api_docs/python/tf/io) 模块还包含用于读取和写入 TFRecord 文件的纯 Python 函数。
### 写入 TFRecord 文件
接下来,将 10,000 个观测值写入文件 `test.tfrecord`。每个观测值都将转换为一条 `tf.Example` 消息,然后被写入文件。随后,您可以验证是否已创建 `test.tfrecord` 文件:
```py
# Write the `tf.Example` observations to the file.
with tf.io.TFRecordWriter(filename) as writer:
for i in range(n_observations):
example = serialize_example(feature0[i], feature1[i], feature2[i], feature3[i])
writer.write(example)
```
```py
du -sh {filename}
```
```py
984K test.tfrecord
```
### 读取 TFRecord 文件
您可以使用 [`tf.train.Example.ParseFromString`](https://tensorflow.google.cn/api_docs/python/tf/train/Example#ParseFromString) 轻松解析以下序列化张量:
```py
filenames = [filename]
raw_dataset = tf.data.TFRecordDataset(filenames)
raw_dataset
```
```py
<TFRecordDatasetV2 shapes: (), types: tf.string>
```
```py
for raw_record in raw_dataset.take(1):
example = tf.train.Example()
example.ParseFromString(raw_record.numpy())
print(example)
```
```py
features {
feature {
key: "feature0"
value {
int64_list {
value: 0
}
}
}
feature {
key: "feature1"
value {
int64_list {
value: 1
}
}
}
feature {
key: "feature2"
value {
bytes_list {
value: "dog"
}
}
}
feature {
key: "feature3"
value {
float_list {
value: -0.07658295333385468
}
}
}
}
```
## 演练:读取和写入图像数据
下面是关于如何使用 TFRecord 读取和写入图像数据的端到端示例。您将使用图像作为输入数据,将数据写入 TFRecord 文件,然后将文件读取回来并显示图像。
如果您想在同一个输入数据集上使用多个模型,这种做法会很有用。您可以不以原始格式存储图像,而是将图像预处理为 TFRecord 格式,然后将其用于所有后续的处理和建模中。
首先,让我们下载雪中的猫的[图像](https://commons.wikimedia.org/wiki/File:Felis_catus-cat_on_snow.jpg),以及施工中的纽约威廉斯堡大桥的[照片](https://upload.wikimedia.org/wikipedia/commons/f/fe/New_East_River_Bridge_from_Brooklyn_det.4a09796u.jpg)。
### 提取图像
```py
cat_in_snow = tf.keras.utils.get_file('320px-Felis_catus-cat_on_snow.jpg', 'https://storage.googleapis.com/download.tensorflow.org/example_images/320px-Felis_catus-cat_on_snow.jpg')
williamsburg_bridge = tf.keras.utils.get_file('194px-New_East_River_Bridge_from_Brooklyn_det.4a09796u.jpg','https://storage.googleapis.com/download.tensorflow.org/example_images/194px-New_East_River_Bridge_from_Brooklyn_det.4a09796u.jpg')
```
```py
Downloading data from https://storage.googleapis.com/download.tensorflow.org/example_images/320px-Felis_catus-cat_on_snow.jpg
24576/17858 [=========================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/download.tensorflow.org/example_images/194px-New_East_River_Bridge_from_Brooklyn_det.4a09796u.jpg
16384/15477 [===============================] - 0s 0us/step
```
```py
display.display(display.Image(filename=cat_in_snow))
display.display(display.HTML('Image cc-by: &lt;a "href=https://commons.wikimedia.org/wiki/File:Felis_catus-cat_on_snow.jpg"&gt;Von.grzanka&lt;/a&gt;'))
```
![jpeg](img/e8d23da7a633c8eaa5878bca988b63f3.png)
<devsite-iframe><iframe src="/tutorials/load_data/tfrecord_18e75c221c94e393dd220675e75e11f00396fba71f4f2f7dd4243d29b5211f99.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe>
```py
display.display(display.Image(filename=williamsburg_bridge))
display.display(display.HTML('&lt;a "href=https://commons.wikimedia.org/wiki/File:New_East_River_Bridge_from_Brooklyn_det.4a09796u.jpg"&gt;From Wikimedia&lt;/a&gt;'))
```
![jpeg](img/47e128c5852147da0f7b0158465fe752.png)
<devsite-iframe><iframe src="/tutorials/load_data/tfrecord_fdbf6eeae5d946ca11fd6cf0ee65293f17b1d7924d0d1b5ff4c6af07d2f321a3.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe>
### 写入 TFRecord 文件
和以前一样,将特征编码为与 `tf.Example` 兼容的类型。这将存储原始图像字符串特征,以及高度、宽度、深度和任意 `label` 特征。后者会在您写入文件以区分猫和桥的图像时使用。将 `0` 用于猫的图像,将 `1` 用于桥的图像:
```py
image_labels = {
cat_in_snow : 0,
williamsburg_bridge : 1,
}
```
```py
# This is an example, just using the cat image.
image_string = open(cat_in_snow, 'rb').read()
label = image_labels[cat_in_snow]
# Create a dictionary with features that may be relevant.
def image_example(image_string, label):
image_shape = tf.image.decode_jpeg(image_string).shape
feature = {
'height': _int64_feature(image_shape[0]),
'width': _int64_feature(image_shape[1]),
'depth': _int64_feature(image_shape[2]),
'label': _int64_feature(label),
'image_raw': _bytes_feature(image_string),
}
return tf.train.Example(features=tf.train.Features(feature=feature))
for line in str(image_example(image_string, label)).split('\n')[:15]:
print(line)
print('...')
```
```py
features {
feature {
key: "depth"
value {
int64_list {
value: 3
}
}
}
feature {
key: "height"
value {
int64_list {
value: 213
}
...
```
请注意,所有特征现在都存储在 `tf.Example` 消息中。接下来,函数化上面的代码,并将示例消息写入名为 `images.tfrecords` 的文件:
```py
# Write the raw image files to `images.tfrecords`.
# First, process the two images into `tf.Example` messages.
# Then, write to a `.tfrecords` file.
record_file = 'images.tfrecords'
with tf.io.TFRecordWriter(record_file) as writer:
for filename, label in image_labels.items():
image_string = open(filename, 'rb').read()
tf_example = image_example(image_string, label)
writer.write(tf_example.SerializeToString())
```
```py
du -sh {record_file}
```
```py
36K images.tfrecords
```
### 读取 TFRecord 文件
现在,您有文件 `images.tfrecords`,并可以迭代其中的记录以将您写入的内容读取回来。因为在此示例中您只需重新生成图像,所以您只需要原始图像字符串这一个特征。使用上面描述的 getter 方法(即 `example.features.feature['image_raw'].bytes_list.value[0]`)提取该特征。您还可以使用标签来确定哪个记录是猫,哪个记录是桥:
```py
raw_image_dataset = tf.data.TFRecordDataset('images.tfrecords')
# Create a dictionary describing the features.
image_feature_description = {
'height': tf.io.FixedLenFeature([], tf.int64),
'width': tf.io.FixedLenFeature([], tf.int64),
'depth': tf.io.FixedLenFeature([], tf.int64),
'label': tf.io.FixedLenFeature([], tf.int64),
'image_raw': tf.io.FixedLenFeature([], tf.string),
}
def _parse_image_function(example_proto):
# Parse the input tf.Example proto using the dictionary above.
return tf.io.parse_single_example(example_proto, image_feature_description)
parsed_image_dataset = raw_image_dataset.map(_parse_image_function)
parsed_image_dataset
```
```py
<MapDataset shapes: {depth: (), height: (), image_raw: (), label: (), width: ()}, types: {depth: tf.int64, height: tf.int64, image_raw: tf.string, label: tf.int64, width: tf.int64}>
```
从 TFRecord 文件中恢复图像:
```py
for image_features in parsed_image_dataset:
image_raw = image_features['image_raw'].numpy()
display.display(display.Image(data=image_raw))
```
![jpeg](img/36943305bc87e9d7bacdd3122d2620ca.png)
![jpeg](img/9a244f6224055e7727787fe289c2ca7c.png)

View File

@@ -1 +0,0 @@
# Estimator

View File

@@ -1,410 +0,0 @@
# 预创建的 Estimators
> 原文:[https://tensorflow.google.cn/tutorials/estimator/premade](https://tensorflow.google.cn/tutorials/estimator/premade)
**Note:** 我们的 TensorFlow 社区翻译了这些文档。因为社区翻译是尽力而为, 所以无法保证它们是最准确的,并且反映了最新的 [官方英文文档](https://tensorflow.google.cn/?hl=en)。如果您有改进此翻译的建议, 请提交 pull request 到 [tensorflow/docs](https://github.com/tensorflow/docs) GitHub 仓库。要志愿地撰写或者审核译文,请加入 [docs-zh-cn@tensorflow.org Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs-zh-cn)。
本教程将向您展示如何使用 Estimators 解决 Tensorflow 中的鸢尾花Iris分类问题。Estimator 是 Tensorflow 完整模型的高级表示,它被设计用于轻松扩展和异步训练。更多细节请参阅 [Estimators](https://tensorflow.google.cn/guide/estimator)。
请注意,在 Tensorflow 2.0 中,[Keras API](https://tensorflow.google.cn/guide/keras) 可以完成许多相同的任务,而且被认为是一个更易学习的 API。如果您刚刚开始入门我们建议您从 Keras 开始。有关 Tensorflow 2.0 中可用高级 API 的更多信息,请参阅 [Keras 标准化](https://medium.com/tensorflow/standardizing-on-keras-guidance-on-high-level-apis-in-tensorflow-2-0-bad2b04c819a)。
## 首先要做的事
为了开始,您将首先导入 Tensorflow 和一系列您需要的库。
```py
import tensorflow as tf
import pandas as pd
```
## 数据集
本文档中的示例程序构建并测试了一个模型,该模型根据[花萼](https://en.wikipedia.org/wiki/Sepal)和[花瓣](https://en.wikipedia.org/wiki/Petal)的大小将鸢尾花分成三种物种。
您将使用鸢尾花数据集训练模型。该数据集包括四个特征和一个[标签](https://developers.google.cn/machine-learning/glossary/#label)。这四个特征确定了单个鸢尾花的以下植物学特征:
* 花萼长度
* 花萼宽度
* 花瓣长度
* 花瓣宽度
根据这些信息,您可以定义一些有用的常量来解析数据:
```py
CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']
SPECIES = ['Setosa', 'Versicolor', 'Virginica']
```
接下来,使用 Keras 与 Pandas 下载并解析鸢尾花数据集。注意为训练和测试保留不同的数据集。
```py
train_path = tf.keras.utils.get_file(
"iris_training.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv")
test_path = tf.keras.utils.get_file(
"iris_test.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv")
train = pd.read_csv(train_path, names=CSV_COLUMN_NAMES, header=0)
test = pd.read_csv(test_path, names=CSV_COLUMN_NAMES, header=0)
```
通过检查数据您可以发现有四列浮点型特征和一列 int32 型标签。
```py
train.head()
```
<devsite-iframe><iframe src="/tutorials/estimator/premade_4ef55cf026eec3ed4d0c8562a0aea6d97ef7158cca81e5bee02dcca4014bb030.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe>
对于每个数据集都分割出标签,模型将被训练来预测这些标签。
```py
train_y = train.pop('Species')
test_y = test.pop('Species')
# 标签列现已从数据中删除
train.head()
```
<devsite-iframe><iframe src="/tutorials/estimator/premade_cb6311d9578c260b2f77793d8cda49d8df64c7169d77c5cd07ed7ef07477a397.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe>
## Estimator 编程概述
现在您已经设定好了数据,您可以使用 Tensorflow Estimator 定义模型。Estimator 是从 [`tf.estimator.Estimator`](https://tensorflow.google.cn/api_docs/python/tf/estimator/Estimator) 中派生的任何类。Tensorflow 提供了一组[`tf.estimator`](https://tensorflow.google.cn/api_docs/python/tf/estimator)(例如,`LinearRegressor`)来实现常见的机器学习算法。此外,您可以编写您自己的[自定义 Estimator](https://tensorflow.google.cn/guide/custom_estimators)。入门阶段我们建议使用预创建的 Estimator。
为了编写基于预创建的 Estimator 的 Tensorflow 项目,您必须完成以下工作:
* 创建一个或多个输入函数
* 定义模型的特征列
* 实例化一个 Estimator指定特征列和各种超参数。
* 在 Estimator 对象上调用一个或多个方法,传递合适的输入函数以作为数据源。
我们来看看这些任务是如何在鸢尾花分类中实现的。
## 创建输入函数
您必须创建输入函数来提供用于训练、评估和预测的数据。
**输入函数**是一个返回 [`tf.data.Dataset`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset) 对象的函数,此对象会输出下列含两个元素的元组:
* [`features`](https://developers.google.cn/machine-learning/glossary/#feature)——Python 字典,其中:
* 每个键都是特征名称
* 每个值都是包含此特征所有值的数组
* `label` 包含每个样本的[标签](https://developers.google.cn/machine-learning/glossary/#label)的值的数组。
为了向您展示输入函数的格式,请查看下面这个简单的实现:
```py
def input_evaluation_set():
features = {'SepalLength': np.array([6.4, 5.0]),
'SepalWidth': np.array([2.8, 2.3]),
'PetalLength': np.array([5.6, 3.3]),
'PetalWidth': np.array([2.2, 1.0])}
labels = np.array([2, 1])
return features, labels
```
您的输入函数可以以您喜欢的方式生成 `features` 字典与 `label` 列表。但是,我们建议使用 Tensorflow 的 [Dataset API](https://tensorflow.google.cn/guide/datasets),该 API 可以用来解析各种类型的数据。
Dataset API 可以为您处理很多常见情况。例如,使用 Dataset API您可以轻松地从大量文件中并行读取记录并将它们合并为单个数据流。
为了简化此示例,我们将使用 [pandas](https://pandas.pydata.org/) 加载数据,并利用此内存数据构建输入管道。
```py
def input_fn(features, labels, training=True, batch_size=256):
"""An input function for training or evaluating"""
# 将输入转换为数据集。
dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
# 如果在训练模式下混淆并重复数据。
if training:
dataset = dataset.shuffle(1000).repeat()
return dataset.batch(batch_size)
```
## 定义特征列feature columns
[**特征列feature columns**](https://developers.google.cn/machine-learning/glossary/#feature_columns)是一个对象,用于描述模型应该如何使用特征字典中的原始输入数据。当您构建一个 Estimator 模型的时候,您会向其传递一个特征列的列表,其中包含您希望模型使用的每个特征。[`tf.feature_column`](https://tensorflow.google.cn/api_docs/python/tf/feature_column) 模块提供了许多为模型表示数据的选项。
对于鸢尾花问题4 个原始特征是数值,因此我们将构建一个特征列的列表,以告知 Estimator 模型将 4 个特征都表示为 32 位浮点值。故创建特征列的代码如下所示:
```py
# 特征列描述了如何使用输入。
my_feature_columns = []
for key in train.keys():
my_feature_columns.append(tf.feature_column.numeric_column(key=key))
```
特征列可能比上述示例复杂得多。您可以从[指南](https://tensorflow.google.cn/guide/feature_columns)获取更多关于特征列的信息。
我们已经介绍了如何使模型表示原始特征,现在您可以构建 Estimator 了。
## 实例化 Estimator
鸢尾花为题是一个经典的分类问题。幸运的是Tensorflow 提供了几个预创建的 Estimator 分类器,其中包括:
* [`tf.estimator.DNNClassifier`](https://tensorflow.google.cn/api_docs/python/tf/estimator/DNNClassifier) 用于多类别分类的深度模型
* [`tf.estimator.DNNLinearCombinedClassifier`](https://tensorflow.google.cn/api_docs/python/tf/estimator/DNNLinearCombinedClassifier) 用于广度与深度模型
* [`tf.estimator.LinearClassifier`](https://tensorflow.google.cn/api_docs/python/tf/estimator/LinearClassifier) 用于基于线性模型的分类器
对于鸢尾花问题,[`tf.estimator.DNNClassifier`](https://tensorflow.google.cn/api_docs/python/tf/estimator/DNNClassifier) 似乎是最好的选择。您可以这样实例化该 Estimator
```py
# 构建一个拥有两个隐层,隐藏节点分别为 30 和 10 的深度神经网络。
classifier = tf.estimator.DNNClassifier(
feature_columns=my_feature_columns,
# 隐层所含结点数量分别为 30 和 10.
hidden_units=[30, 10],
# 模型必须从三个类别中做出选择。
n_classes=3)
```
```py
INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpkhwws8ja
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpkhwws8ja', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
```
## 训练、评估和预测
我们已经有一个 Estimator 对象,现在可以调用方法来执行下列操作:
* 训练模型。
* 评估经过训练的模型。
* 使用经过训练的模型进行预测。
### 训练模型
通过调用 Estimator 的 `Train` 方法来训练模型,如下所示:
```py
# 训练模型。
classifier.train(
input_fn=lambda: input_fn(train, train_y, training=True),
steps=5000)
```
```py
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:Layer dnn is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2\. The layer has dtype float32 because its dtype defaults to floatx.
If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.
To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.
Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/adagrad.py:83: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpkhwws8ja/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 1.6968713, step = 0
INFO:tensorflow:global_step/sec: 308.34
INFO:tensorflow:loss = 1.1691835, step = 100 (0.325 sec)
INFO:tensorflow:global_step/sec: 365.112
INFO:tensorflow:loss = 1.0332501, step = 200 (0.274 sec)
INFO:tensorflow:global_step/sec: 365.44
INFO:tensorflow:loss = 0.9807229, step = 300 (0.274 sec)
INFO:tensorflow:global_step/sec: 364.789
INFO:tensorflow:loss = 0.9437329, step = 400 (0.274 sec)
INFO:tensorflow:global_step/sec: 368.124
INFO:tensorflow:loss = 0.94162637, step = 500 (0.272 sec)
INFO:tensorflow:global_step/sec: 366.689
INFO:tensorflow:loss = 0.9129944, step = 600 (0.273 sec)
INFO:tensorflow:global_step/sec: 368.813
INFO:tensorflow:loss = 0.91519016, step = 700 (0.271 sec)
INFO:tensorflow:global_step/sec: 369.377
INFO:tensorflow:loss = 0.8866866, step = 800 (0.271 sec)
INFO:tensorflow:global_step/sec: 371.999
INFO:tensorflow:loss = 0.88594323, step = 900 (0.269 sec)
INFO:tensorflow:global_step/sec: 372.481
INFO:tensorflow:loss = 0.8859284, step = 1000 (0.269 sec)
INFO:tensorflow:global_step/sec: 369.793
INFO:tensorflow:loss = 0.87800217, step = 1100 (0.270 sec)
INFO:tensorflow:global_step/sec: 364.966
INFO:tensorflow:loss = 0.8652306, step = 1200 (0.274 sec)
INFO:tensorflow:global_step/sec: 368.742
INFO:tensorflow:loss = 0.8569569, step = 1300 (0.271 sec)
INFO:tensorflow:global_step/sec: 368.955
INFO:tensorflow:loss = 0.8538004, step = 1400 (0.271 sec)
INFO:tensorflow:global_step/sec: 371.44
INFO:tensorflow:loss = 0.8501439, step = 1500 (0.269 sec)
INFO:tensorflow:global_step/sec: 369.55
INFO:tensorflow:loss = 0.8453819, step = 1600 (0.271 sec)
INFO:tensorflow:global_step/sec: 366
INFO:tensorflow:loss = 0.83854586, step = 1700 (0.273 sec)
INFO:tensorflow:global_step/sec: 370.695
INFO:tensorflow:loss = 0.81984085, step = 1800 (0.270 sec)
INFO:tensorflow:global_step/sec: 371.791
INFO:tensorflow:loss = 0.8254725, step = 1900 (0.271 sec)
INFO:tensorflow:global_step/sec: 363.724
INFO:tensorflow:loss = 0.839285, step = 2000 (0.273 sec)
INFO:tensorflow:global_step/sec: 366.998
INFO:tensorflow:loss = 0.81192434, step = 2100 (0.273 sec)
INFO:tensorflow:global_step/sec: 362.578
INFO:tensorflow:loss = 0.80626756, step = 2200 (0.276 sec)
INFO:tensorflow:global_step/sec: 370.678
INFO:tensorflow:loss = 0.8144733, step = 2300 (0.270 sec)
INFO:tensorflow:global_step/sec: 367.415
INFO:tensorflow:loss = 0.80486006, step = 2400 (0.272 sec)
INFO:tensorflow:global_step/sec: 363.869
INFO:tensorflow:loss = 0.7996403, step = 2500 (0.275 sec)
INFO:tensorflow:global_step/sec: 366.247
INFO:tensorflow:loss = 0.78972137, step = 2600 (0.273 sec)
INFO:tensorflow:global_step/sec: 366.514
INFO:tensorflow:loss = 0.7898851, step = 2700 (0.273 sec)
INFO:tensorflow:global_step/sec: 363.635
INFO:tensorflow:loss = 0.7798088, step = 2800 (0.275 sec)
INFO:tensorflow:global_step/sec: 371.201
INFO:tensorflow:loss = 0.7830296, step = 2900 (0.269 sec)
INFO:tensorflow:global_step/sec: 372.843
INFO:tensorflow:loss = 0.78415155, step = 3000 (0.268 sec)
INFO:tensorflow:global_step/sec: 370.754
INFO:tensorflow:loss = 0.7710204, step = 3100 (0.270 sec)
INFO:tensorflow:global_step/sec: 373.092
INFO:tensorflow:loss = 0.7817295, step = 3200 (0.268 sec)
INFO:tensorflow:global_step/sec: 369.337
INFO:tensorflow:loss = 0.78129435, step = 3300 (0.271 sec)
INFO:tensorflow:global_step/sec: 368.646
INFO:tensorflow:loss = 0.78726315, step = 3400 (0.271 sec)
INFO:tensorflow:global_step/sec: 367.989
INFO:tensorflow:loss = 0.76692796, step = 3500 (0.273 sec)
INFO:tensorflow:global_step/sec: 365.108
INFO:tensorflow:loss = 0.7719732, step = 3600 (0.272 sec)
INFO:tensorflow:global_step/sec: 370.532
INFO:tensorflow:loss = 0.76764953, step = 3700 (0.270 sec)
INFO:tensorflow:global_step/sec: 362.993
INFO:tensorflow:loss = 0.75807786, step = 3800 (0.277 sec)
INFO:tensorflow:global_step/sec: 365.707
INFO:tensorflow:loss = 0.7590251, step = 3900 (0.272 sec)
INFO:tensorflow:global_step/sec: 368.977
INFO:tensorflow:loss = 0.7478892, step = 4000 (0.271 sec)
INFO:tensorflow:global_step/sec: 370.263
INFO:tensorflow:loss = 0.74537545, step = 4100 (0.270 sec)
INFO:tensorflow:global_step/sec: 370.648
INFO:tensorflow:loss = 0.7506561, step = 4200 (0.270 sec)
INFO:tensorflow:global_step/sec: 372.419
INFO:tensorflow:loss = 0.74983096, step = 4300 (0.268 sec)
INFO:tensorflow:global_step/sec: 370.771
INFO:tensorflow:loss = 0.74485517, step = 4400 (0.270 sec)
INFO:tensorflow:global_step/sec: 371.489
INFO:tensorflow:loss = 0.74746263, step = 4500 (0.269 sec)
INFO:tensorflow:global_step/sec: 370.063
INFO:tensorflow:loss = 0.7356381, step = 4600 (0.270 sec)
INFO:tensorflow:global_step/sec: 370.305
INFO:tensorflow:loss = 0.74623525, step = 4700 (0.270 sec)
INFO:tensorflow:global_step/sec: 365.488
INFO:tensorflow:loss = 0.7425093, step = 4800 (0.274 sec)
INFO:tensorflow:global_step/sec: 370.235
INFO:tensorflow:loss = 0.7342787, step = 4900 (0.270 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 5000...
INFO:tensorflow:Saving checkpoints for 5000 into /tmp/tmpkhwws8ja/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 5000...
INFO:tensorflow:Loss for final step: 0.7211363.
<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifierV2 at 0x7f16ef6d0cf8>
```
注意将 `input_fn` 调用封装在 [`lambda`](https://docs.python.org/3/tutorial/controlflow.html) 中以获取参数,同时提供不带参数的输入函数,如 Estimator 所预期的那样。`step` 参数告知该方法在训练多少步后停止训练。
### 评估经过训练的模型
现在模型已经经过训练您可以获取一些关于模型性能的统计信息。代码块将在测试数据上对经过训练的模型的准确率accuracy进行评估
```py
eval_result = classifier.evaluate(
input_fn=lambda: input_fn(test, test_y, training=False))
print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))
```
```py
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:Layer dnn is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2\. The layer has dtype float32 because its dtype defaults to floatx.
If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.
To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-09-22T19:58:23Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpkhwws8ja/model.ckpt-5000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.20579s
INFO:tensorflow:Finished evaluation at 2020-09-22-19:58:23
INFO:tensorflow:Saving dict for global step 5000: accuracy = 0.53333336, average_loss = 0.760622, global_step = 5000, loss = 0.760622
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 5000: /tmp/tmpkhwws8ja/model.ckpt-5000
Test set accuracy: 0.533
```
与对 `train` 方法的调用不同,我们没有传递 `steps` 参数来进行评估。用于评估的 `input_fn` 只生成一个 [epoch](https://developers.google.cn/machine-learning/glossary/#epoch) 的数据。
`eval_result` 字典亦包含 `average_loss`(每个样本的平均误差),`loss`(每个 mini-batch 的平均误差)与 Estimator 的 `global_step`(经历的训练迭代次数)值。
### 利用经过训练的模型进行预测(推理)
我们已经有一个经过训练的模型,可以生成准确的评估结果。我们现在可以使用经过训练的模型,根据一些无标签测量结果预测鸢尾花的品种。与训练和评估一样,我们使用单个函数调用进行预测:
```py
# 由模型生成预测
expected = ['Setosa', 'Versicolor', 'Virginica']
predict_x = {
'SepalLength': [5.1, 5.9, 6.9],
'SepalWidth': [3.3, 3.0, 3.1],
'PetalLength': [1.7, 4.2, 5.4],
'PetalWidth': [0.5, 1.5, 2.1],
}
def input_fn(features, batch_size=256):
"""An input function for prediction."""
# 将输入转换为无标签数据集。
return tf.data.Dataset.from_tensor_slices(dict(features)).batch(batch_size)
predictions = classifier.predict(
input_fn=lambda: input_fn(predict_x))
```
`predict` 方法返回一个 Python 可迭代对象,为每个样本生成一个预测结果字典。以下代码输出了一些预测及其概率:
```py
for pred_dict, expec in zip(predictions, expected):
class_id = pred_dict['class_ids'][0]
probability = pred_dict['probabilities'][class_id]
print('Prediction is "{}" ({:.1f}%), expected "{}"'.format(
SPECIES[class_id], 100 * probability, expec))
```
```py
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpkhwws8ja/model.ckpt-5000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Prediction is "Versicolor" (36.6%), expected "Setosa"
Prediction is "Virginica" (50.9%), expected "Versicolor"
Prediction is "Virginica" (62.6%), expected "Virginica"
```

View File

@@ -1,340 +0,0 @@
# Build a linear model with Estimators
> 原文:[https://tensorflow.google.cn/tutorials/estimator/linear](https://tensorflow.google.cn/tutorials/estimator/linear)
## Overview
This end-to-end walkthrough trains a logistic regression model using the [`tf.estimator`](https://tensorflow.google.cn/api_docs/python/tf/estimator) API. The model is often used as a baseline for other, more complex, algorithms.
## Setup
```py
pip install -q sklearn
```
```py
WARNING: You are using pip version 20.2.2; however, version 20.2.3 is available.
You should consider upgrading via the '/tmpfs/src/tf_docs_env/bin/python -m pip install --upgrade pip' command.
```
```py
import os
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import clear_output
from six.moves import urllib
```
## Load the titanic dataset
You will use the Titanic dataset with the (rather morbid) goal of predicting passenger survival, given characteristics such as gender, age, class, etc.
```py
import tensorflow.compat.v2.feature_column as fc
import tensorflow as tf
```
```py
# Load dataset.
dftrain = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv')
dfeval = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv')
y_train = dftrain.pop('survived')
y_eval = dfeval.pop('survived')
```
## Explore the data
The dataset contains the following features
```py
dftrain.head()
```
<devsite-iframe><iframe src="/tutorials/estimator/linear_e1d1ae84a379eaa74df0c61aaa7a21a3176437c97b21c8d76a60084b4492e8af.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe>
```py
dftrain.describe()
```
<devsite-iframe><iframe src="/tutorials/estimator/linear_d8dcd808885e18f5a885898a43268a31ad574660d1cc95e2c479588cd86ac79e.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe>
There are 627 and 264 examples in the training and evaluation sets, respectively.
```py
dftrain.shape[0], dfeval.shape[0]
```
```py
(627, 264)
```
The majority of passengers are in their 20's and 30's.
```py
dftrain.age.hist(bins=20)
```
```py
<matplotlib.axes._subplots.AxesSubplot at 0x7f8e946914a8>
```
![png](img/7d1de3cd2c94ab5fb2b9e44445a2fa6b.png)
There are approximately twice as many male passengers as female passengers aboard.
```py
dftrain.sex.value_counts().plot(kind='barh')
```
```py
<matplotlib.axes._subplots.AxesSubplot at 0x7f8e925da208>
```
![png](img/2ab61e10f9f53c1738f397150ea65f3d.png)
The majority of passengers were in the "third" class.
```py
dftrain['class'].value_counts().plot(kind='barh')
```
```py
<matplotlib.axes._subplots.AxesSubplot at 0x7f8e920e0588>
```
![png](img/90c153ba31f6c32d7d760bc031b5d956.png)
Females have a much higher chance of surviving versus males. This is clearly a predictive feature for the model.
```py
pd.concat([dftrain, y_train], axis=1).groupby('sex').survived.mean().plot(kind='barh').set_xlabel('% survive')
```
```py
Text(0.5, 0, '% survive')
```
![png](img/aaf0cfc73c7f275786e66d759ad26df6.png)
## Feature Engineering for the Model
Estimators use a system called [feature columns](https://tensorflow.google.cn/guide/feature_columns) to describe how the model should interpret each of the raw input features. An Estimator expects a vector of numeric inputs, and *feature columns* describe how the model should convert each feature.
Selecting and crafting the right set of feature columns is key to learning an effective model. A feature column can be either one of the raw inputs in the original features `dict` (a *base feature column*), or any new columns created using transformations defined over one or multiple base columns (a *derived feature columns*).
The linear estimator uses both numeric and categorical features. Feature columns work with all TensorFlow estimators and their purpose is to define the features used for modeling. Additionally, they provide some feature engineering capabilities like one-hot-encoding, normalization, and bucketization.
### Base Feature Columns
```py
CATEGORICAL_COLUMNS = ['sex', 'n_siblings_spouses', 'parch', 'class', 'deck',
'embark_town', 'alone']
NUMERIC_COLUMNS = ['age', 'fare']
feature_columns = []
for feature_name in CATEGORICAL_COLUMNS:
vocabulary = dftrain[feature_name].unique()
feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary))
for feature_name in NUMERIC_COLUMNS:
feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.float32))
```
The `input_function` specifies how data is converted to a [`tf.data.Dataset`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset) that feeds the input pipeline in a streaming fashion. [`tf.data.Dataset`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset) can take in multiple sources such as a dataframe, a csv-formatted file, and more.
```py
def make_input_fn(data_df, label_df, num_epochs=10, shuffle=True, batch_size=32):
def input_function():
ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df))
if shuffle:
ds = ds.shuffle(1000)
ds = ds.batch(batch_size).repeat(num_epochs)
return ds
return input_function
train_input_fn = make_input_fn(dftrain, y_train)
eval_input_fn = make_input_fn(dfeval, y_eval, num_epochs=1, shuffle=False)
```
You can inspect the dataset:
```py
ds = make_input_fn(dftrain, y_train, batch_size=10)()
for feature_batch, label_batch in ds.take(1):
print('Some feature keys:', list(feature_batch.keys()))
print()
print('A batch of class:', feature_batch['class'].numpy())
print()
print('A batch of Labels:', label_batch.numpy())
```
```py
Some feature keys: ['sex', 'age', 'n_siblings_spouses', 'parch', 'fare', 'class', 'deck', 'embark_town', 'alone']
A batch of class: [b'Third' b'Third' b'Third' b'Third' b'First' b'Third' b'Third' b'First'
b'Third' b'Third']
A batch of Labels: [1 0 0 0 1 0 0 0 0 0]
```
You can also inspect the result of a specific feature column using the [`tf.keras.layers.DenseFeatures`](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/DenseFeatures) layer:
```py
age_column = feature_columns[7]
tf.keras.layers.DenseFeatures([age_column])(feature_batch).numpy()
```
```py
WARNING:tensorflow:Layer dense_features is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2\. The layer has dtype float32 because its dtype defaults to floatx.
If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.
To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.
array([[27.],
[28.],
[30.],
[18.],
[32.],
[26.],
[61.],
[37.],
[28.],
[40.]], dtype=float32)
```
`DenseFeatures` only accepts dense tensors, to inspect a categorical column you need to transform that to a indicator column first:
```py
gender_column = feature_columns[0]
tf.keras.layers.DenseFeatures([tf.feature_column.indicator_column(gender_column)])(feature_batch).numpy()
```
```py
WARNING:tensorflow:Layer dense_features_1 is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2\. The layer has dtype float32 because its dtype defaults to floatx.
If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.
To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.
array([[1., 0.],
[1., 0.],
[1., 0.],
[0., 1.],
[1., 0.],
[1., 0.],
[1., 0.],
[1., 0.],
[1., 0.],
[1., 0.]], dtype=float32)
```
After adding all the base features to the model, let's train the model. Training a model is just a single command using the [`tf.estimator`](https://tensorflow.google.cn/api_docs/python/tf/estimator) API:
```py
linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns)
linear_est.train(train_input_fn)
result = linear_est.evaluate(eval_input_fn)
clear_output()
print(result)
```
```py
{'accuracy': 0.7613636, 'accuracy_baseline': 0.625, 'auc': 0.809244, 'auc_precision_recall': 0.75609726, 'average_loss': 0.5452906, 'label/mean': 0.375, 'loss': 0.5347039, 'precision': 0.75, 'prediction/mean': 0.27201703, 'recall': 0.54545456, 'global_step': 200}
```
### Derived Feature Columns
Now you reached an accuracy of 75%. Using each base feature column separately may not be enough to explain the data. For example, the correlation between gender and the label may be different for different gender. Therefore, if you only learn a single model weight for `gender="Male"` and `gender="Female"`, you won't capture every age-gender combination (e.g. distinguishing between `gender="Male"` AND `age="30"` AND `gender="Male"` AND `age="40"`).
To learn the differences between different feature combinations, you can add *crossed feature columns* to the model (you can also bucketize age column before the cross column):
```py
age_x_gender = tf.feature_column.crossed_column(['age', 'sex'], hash_bucket_size=100)
```
After adding the combination feature to the model, let's train the model again:
```py
derived_feature_columns = [age_x_gender]
linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns+derived_feature_columns)
linear_est.train(train_input_fn)
result = linear_est.evaluate(eval_input_fn)
clear_output()
print(result)
```
```py
{'accuracy': 0.7613636, 'accuracy_baseline': 0.625, 'auc': 0.84352624, 'auc_precision_recall': 0.78346276, 'average_loss': 0.48114488, 'label/mean': 0.375, 'loss': 0.4756022, 'precision': 0.65789473, 'prediction/mean': 0.4285249, 'recall': 0.75757575, 'global_step': 200}
```
It now achieves an accuracy of 77.6%, which is slightly better than only trained in base features. You can try using more features and transformations to see if you can do better!
Now you can use the train model to make predictions on a passenger from the evaluation set. TensorFlow models are optimized to make predictions on a batch, or collection, of examples at once. Earlier, the `eval_input_fn` was defined using the entire evaluation set.
```py
pred_dicts = list(linear_est.predict(eval_input_fn))
probs = pd.Series([pred['probabilities'][1] for pred in pred_dicts])
probs.plot(kind='hist', bins=20, title='predicted probabilities')
```
```py
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:Layer linear/linear_model is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2\. The layer has dtype float32 because its dtype defaults to floatx.
If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.
To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpg17o3o7e/model.ckpt-200
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
<matplotlib.axes._subplots.AxesSubplot at 0x7f8e2c1dd358>
```
![png](img/5fcd4749c7b37cf8714bd83753d1da5b.png)
Finally, look at the receiver operating characteristic (ROC) of the results, which will give us a better idea of the tradeoff between the true positive rate and false positive rate.
```py
from sklearn.metrics import roc_curve
from matplotlib import pyplot as plt
fpr, tpr, _ = roc_curve(y_eval, probs)
plt.plot(fpr, tpr)
plt.title('ROC curve')
plt.xlabel('false positive rate')
plt.ylabel('true positive rate')
plt.xlim(0,)
plt.ylim(0,)
```
```py
(0.0, 1.05)
```
![png](img/2230343d999d9f0dd8b71b8bf390e82f.png)

View File

@@ -1,302 +0,0 @@
# 在 Tensorflow 中训练提升树Boosted Trees模型
> 原文:[https://tensorflow.google.cn/tutorials/estimator/boosted_trees](https://tensorflow.google.cn/tutorials/estimator/boosted_trees)
**Note:** 我们的 TensorFlow 社区翻译了这些文档。因为社区翻译是尽力而为, 所以无法保证它们是最准确的,并且反映了最新的 [官方英文文档](https://tensorflow.google.cn/?hl=en)。如果您有改进此翻译的建议, 请提交 pull request 到 [tensorflow/docs](https://github.com/tensorflow/docs) GitHub 仓库。要志愿地撰写或者审核译文,请加入 [docs-zh-cn@tensorflow.org Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs-zh-cn)。
本教程是使用基于 [`tf.estimator`](https://tensorflow.google.cn/api_docs/python/tf/estimator) API 的决策树来训练梯度提升模型的端到端演示。提升树Boosted Trees模型是回归和分类问题中最受欢迎并最有效的机器学习方法之一。这是一种融合技术它结合了几个10 个100 个或者甚至 1000 个)树模型的预测值。
提升树Boosted Trees模型受到许多机器学习从业者的欢迎因为它们可以通过最小化的超参数调整获得令人印象深刻的性能。
## 加载泰坦尼克数据集
您将使用泰坦尼克数据集,该数据集的目标是在给出性别、年龄、阶级等特征的条件下预测乘客幸存与否。
```py
import numpy as np
import pandas as pd
from IPython.display import clear_output
from matplotlib import pyplot as plt
# 加载数据集。
dftrain = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv')
dfeval = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv')
y_train = dftrain.pop('survived')
y_eval = dfeval.pop('survived')
```
```py
import tensorflow as tf
tf.random.set_seed(123)
```
数据集由训练集和验证集组成:
* `dftrain``y_train` 是*训练集*——模型用来学习的数据。
* 模型根据*评估集*`dfeval``y_eval` 进行测试。
您将使用以下特征来进行训练:
| 特征名称 | 描述 |
| sex | 乘客的性别 |
| age | 乘客的年龄 |
| n_siblings_spouses | 船上的兄弟姐妹与伙伴 |
| parch | 船上的父母与孩子 |
| fare | 乘客所支付的票价 |
| class | 乘客在船上的舱室等级 |
| deck | 哪个甲板上的乘客 |
| embark_town | 乘客是从哪个城镇上船的 |
| alone | 是否乘客独自一人 |
## 探索数据
让我们首先预览一些数据,并在训练集上创建摘要统计。
```py
dftrain.head()
```
<devsite-iframe><iframe src="/tutorials/estimator/boosted_trees_e1d1ae84a379eaa74df0c61aaa7a21a3176437c97b21c8d76a60084b4492e8af.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe>
```py
dftrain.describe()
```
<devsite-iframe><iframe src="/tutorials/estimator/boosted_trees_d8dcd808885e18f5a885898a43268a31ad574660d1cc95e2c479588cd86ac79e.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe>
训练集和评估集分别有 627 和 264 个样本。
```py
dftrain.shape[0], dfeval.shape[0]
```
```py
(627, 264)
```
大多数乘客在 20 岁或 30 岁。
```py
dftrain.age.hist(bins=20)
plt.show()
```
![png](img/58d9d20121aa86120aded9afa9cfff6d.png)
男乘客大约是女乘客的两倍。
```py
dftrain.sex.value_counts().plot(kind='barh')
plt.show()
```
![png](img/3c3d7b5efcc814913b1fdc4d8ab17c2c.png)
大多数乘客都在“三等”舱。
```py
dftrain['class'].value_counts().plot(kind='barh')
plt.show()
```
![png](img/4630405ff1451bfc3979433eb4bb7a43.png)
大多数乘客从南安普顿出发。
```py
dftrain['embark_town'].value_counts().plot(kind='barh')
plt.show()
```
![png](img/a3920eb34218a65a21b046a30c7d3808.png)
与男性相比,女性存活的几率要高得多。这显然是该模型的预测特征。
```py
pd.concat([dftrain, y_train], axis=1).groupby('sex').survived.mean().plot(kind='barh').set_xlabel('% survive')
plt.show()
```
![png](img/2c848f6027c084a244c86c336c02ce35.png)
## 创建特征列与输入函数
梯度提升Gradient Boosting Estimator 可以利用数值和分类特征。特征列适用于所有的 Tensorflow estimator其目的是定义用于建模的特征。此外它们还提供一些特征工程功能如独热编码one-hot-encoding、标准化normalization和桶化bucketization。在本教程中`CATEGORICAL_COLUMNS` 中的字段从分类列转换为独热编码列([指标列](https://tensorflow.google.cn/api_docs/python/tf/feature_column/indicator_column))
```py
fc = tf.feature_column
CATEGORICAL_COLUMNS = ['sex', 'n_siblings_spouses', 'parch', 'class', 'deck',
'embark_town', 'alone']
NUMERIC_COLUMNS = ['age', 'fare']
def one_hot_cat_column(feature_name, vocab):
return tf.feature_column.indicator_column(
tf.feature_column.categorical_column_with_vocabulary_list(feature_name,
vocab))
feature_columns = []
for feature_name in CATEGORICAL_COLUMNS:
# Need to one-hot encode categorical features.
vocabulary = dftrain[feature_name].unique()
feature_columns.append(one_hot_cat_column(feature_name, vocabulary))
for feature_name in NUMERIC_COLUMNS:
feature_columns.append(tf.feature_column.numeric_column(feature_name,
dtype=tf.float32))
```
您可以查看特征列生成的转换。例如,以下是在单个样本中使用 `indicator_column` 的输出:
```py
example = dict(dftrain.head(1))
class_fc = tf.feature_column.indicator_column(tf.feature_column.categorical_column_with_vocabulary_list('class', ('First', 'Second', 'Third')))
print('Feature value: "{}"'.format(example['class'].iloc[0]))
print('One-hot encoded: ', tf.keras.layers.DenseFeatures([class_fc])(example).numpy())
```
```py
Feature value: "Third"
One-hot encoded: [[ 0\. 0\. 1.]]
```
此外,您还可以一起查看所有特征列的转换:
```py
tf.keras.layers.DenseFeatures(feature_columns)(example).numpy()
```
```py
array([[ 22\. , 1\. , 0\. , 1\. , 0\. , 0\. , 1\. , 0\. ,
0\. , 0\. , 0\. , 0\. , 0\. , 0\. , 1\. , 0\. ,
0\. , 0\. , 7.25, 1\. , 0\. , 0\. , 0\. , 0\. ,
0\. , 0\. , 1\. , 0\. , 0\. , 0\. , 0\. , 0\. ,
1\. , 0\. ]], dtype=float32)
```
接下来,您需要创建输入函数。这些将指定如何将数据读入到我们的模型中以供训练与推理。您将使用 [`tf.data`](https://tensorflow.google.cn/api_docs/python/tf/data)API 中的 `from_tensor_slices` 方法直接从 Pandas 中读取数据。这适用于较小的内存数据集。对于较大的数据集tf.data API 支持各种文件格式(包括 [csv](https://tensorflow.google.cn/api_docs/python/tf/data/experimental/make_csv_dataset),以便您能处理那些不适合放入内存中的数据集。
```py
# 使用大小为全部数据的 batch ,因为数据规模非常小.
NUM_EXAMPLES = len(y_train)
def make_input_fn(X, y, n_epochs=None, shuffle=True):
def input_fn():
dataset = tf.data.Dataset.from_tensor_slices((dict(X), y))
if shuffle:
dataset = dataset.shuffle(NUM_EXAMPLES)
# 对于训练可以按需多次循环数据集n_epochs=None
dataset = dataset.repeat(n_epochs)
# 在内存中训练不使用 batch。
dataset = dataset.batch(NUM_EXAMPLES)
return dataset
return input_fn
# 训练与评估的输入函数。
train_input_fn = make_input_fn(dftrain, y_train)
eval_input_fn = make_input_fn(dfeval, y_eval, shuffle=False, n_epochs=1)
```
## 训练与评估模型
您将执行以下步骤:
1. 初始化模型,指定特征和超参数。
2. 使用 `train_input_fn` 将训练数据输入模型,使用 `train` 函数训练模型。
3. 您将使用此示例中的评估集评估模型性能,即 `dfeval` DataFrame。您将验证预测是否与 `y_eval` 数组中的标签匹配。
在训练提升树Boosted Trees模型之前让我们先训练一个线性分类器逻辑回归模型。最好的做法是从更简单的模型开始建立基准。
```py
linear_est = tf.estimator.LinearClassifier(feature_columns)
# 训练模型。
linear_est.train(train_input_fn, max_steps=100)
# 评估。
result = linear_est.evaluate(eval_input_fn)
clear_output()
print(pd.Series(result))
```
```py
accuracy 0.765152
accuracy_baseline 0.625000
auc 0.832844
auc_precision_recall 0.789631
average_loss 0.478908
global_step 100.000000
label/mean 0.375000
loss 0.478908
precision 0.703297
prediction/mean 0.350790
recall 0.646465
dtype: float64
```
下面让我们训练提升树Boosted Trees模型。提升树Boosted Trees是支持回归`BoostedTreesRegressor`)和分类(`BoostedTreesClassifier`)的。由于目标是预测一个生存与否的标签,您将使用 `BoostedTreesClassifier`
```py
# 由于数据存入内存中,在每层使用全部数据会更快。
# 上面一个 batch 定义为整个数据集。
n_batches = 1
est = tf.estimator.BoostedTreesClassifier(feature_columns,
n_batches_per_layer=n_batches)
# 一旦建立了指定数量的树,模型将停止训练,
# 而不是基于训练步数。
est.train(train_input_fn, max_steps=100)
# 评估。
result = est.evaluate(eval_input_fn)
clear_output()
print(pd.Series(result))
```
```py
accuracy 0.829545
accuracy_baseline 0.625000
auc 0.872788
auc_precision_recall 0.857807
average_loss 0.411839
global_step 100.000000
label/mean 0.375000
loss 0.411839
precision 0.793478
prediction/mean 0.381942
recall 0.737374
dtype: float64
```
现在您可以使用训练的模型从评估集上对乘客进行预测了。Tensorflow 模型经过优化,可以同时在一个 batch 或一个集合的样本上进行预测。之前,`eval_inout_fn` 是使用整个评估集定义的。
```py
pred_dicts = list(est.predict(eval_input_fn))
probs = pd.Series([pred['probabilities'][1] for pred in pred_dicts])
probs.plot(kind='hist', bins=20, title='predicted probabilities')
plt.show()
```
![png](img/56a137f761015af5a025d2d0cc2a9985.png)
最后您还可以查看结果的受试者工作特征曲线ROC这将使我们更好地了解真阳性率与假阴性率之间的权衡。
```py
from sklearn.metrics import roc_curve
fpr, tpr, _ = roc_curve(y_eval, probs)
plt.plot(fpr, tpr)
plt.title('ROC curve')
plt.xlabel('false positive rate')
plt.ylabel('true positive rate')
plt.xlim(0,)
plt.ylim(0,)
plt.show()
```
![png](img/bf058b152584cc8e8c3987a57eb7331f.png)

View File

@@ -1,801 +0,0 @@
# 梯度提升树Gradient Boosted Trees模型理解
> 原文:[https://tensorflow.google.cn/tutorials/estimator/boosted_trees_model_understanding](https://tensorflow.google.cn/tutorials/estimator/boosted_trees_model_understanding)
<devsite-mathjax config="TeX-AMS-MML_SVG"></devsite-mathjax>
**Note:** 我们的 TensorFlow 社区翻译了这些文档。因为社区翻译是尽力而为, 所以无法保证它们是最准确的,并且反映了最新的 [官方英文文档](https://tensorflow.google.cn/?hl=en)。如果您有改进此翻译的建议, 请提交 pull request 到 [tensorflow/docs](https://github.com/tensorflow/docs) GitHub 仓库。要志愿地撰写或者审核译文,请加入 [docs-zh-cn@tensorflow.org Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs-zh-cn)。
对于梯度提升模型Gradient Boosting model的端到端演示end-to-end walkthrough请查阅[在 Tensorflow 中训练提升树Boosted Trees模型](https://tensorflow.google.cn/tutorials/estimator/boosted_trees)。在本教程中,您将:
* 学习到如何对提升树模型Boosted Trees model进行*局部解释*和*全局解释*
* 了解到提升树模型在数据集上的表现。
## 如何对提升树模型Boosted Trees model进行局部解释和全局解释
局部可解释性指模型的预测在单一样例层面上的理解程度而全局可解释性指模型作为一个整体的理解能力。这种技术可以帮助使用机器学习的人在模型开发阶段检测偏差bias和 bug。
对于局部可解释性您将了解到如何创造并可视化每个实例per-instance的贡献度。区别于特征重要性这种贡献被称为 DFCs定向特征贡献directional feature contributions
对于全局可解释性您将学习并可视化基于增益的特征重要性gain-based feature importances排列特征重要性[permutation feature importances](https://www.stat.berkeley.edu/%7Ebreiman/randomforest2001.pdf))和总 DFCs。
## 加载泰坦尼克数据集titanic
本教程使用泰坦尼克数据集,旨在已知乘客的性别,年龄和客舱等级等特征的情况下预测的存活率。
```py
import numpy as np
import pandas as pd
from IPython.display import clear_output
# 加载数据集。
dftrain = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv')
dfeval = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv')
y_train = dftrain.pop('survived')
y_eval = dfeval.pop('survived')
```
```py
import tensorflow as tf
tf.random.set_seed(123)
```
```py
TensorFlow 2.x selected.
```
有关特征的描述,请参阅之前的教程。
## 创建特征列, 输入函数并训练 estimator
### 数据预处理
特征处理使用原始的数值特征和独热编码one-hot-encoding处理过的非数值特征如性别舱位别建立数据集。
```py
fc = tf.feature_column
CATEGORICAL_COLUMNS = ['sex', 'n_siblings_spouses', 'parch', 'class', 'deck',
'embark_town', 'alone']
NUMERIC_COLUMNS = ['age', 'fare']
def one_hot_cat_column(feature_name, vocab):
return fc.indicator_column(
fc.categorical_column_with_vocabulary_list(feature_name,
vocab))
feature_columns = []
for feature_name in CATEGORICAL_COLUMNS:
# 需要使用独热编码one-hot-encoding处理非数值特征。
vocabulary = dftrain[feature_name].unique()
feature_columns.append(one_hot_cat_column(feature_name, vocabulary))
for feature_name in NUMERIC_COLUMNS:
feature_columns.append(fc.numeric_column(feature_name,
dtype=tf.float32))
```
### 构建输入 pipeline
使用 API [`tf.data`](https://tensorflow.google.cn/api_docs/python/tf/data) 中的 `from_tensor_slices` 方法建立输入方程来从 Pandas 中直接读取数据。
```py
# 当数据集小的时候,将整个数据集作为一个 batch。
NUM_EXAMPLES = len(y_train)
def make_input_fn(X, y, n_epochs=None, shuffle=True):
def input_fn():
dataset = tf.data.Dataset.from_tensor_slices((X.to_dict(orient='list'), y))
if shuffle:
dataset = dataset.shuffle(NUM_EXAMPLES)
# 训练时让数据迭代尽可能多次 n_epochs=None
dataset = (dataset
.repeat(n_epochs)
.batch(NUM_EXAMPLES))
return dataset
return input_fn
# 训练并评估输入函数。
train_input_fn = make_input_fn(dftrain, y_train)
eval_input_fn = make_input_fn(dfeval, y_eval, shuffle=False, n_epochs=1)
```
### 训练模型
```py
params = {
'n_trees': 50,
'max_depth': 3,
'n_batches_per_layer': 1,
# 为了得到 DFCs请设置 center_bias = True。这将强制
# 模型在使用特征(例如:回归中训练集标签的均值,分类中使
# 用交叉熵损失函数时的对数几率)前做一个初始预测。
'center_bias': True
}
est = tf.estimator.BoostedTreesClassifier(feature_columns, **params)
# 训练模型。
est.train(train_input_fn, max_steps=100)
# 评估。
results = est.evaluate(eval_input_fn)
clear_output()
pd.Series(results).to_frame()
```
<devsite-iframe><iframe src="/tutorials/estimator/boosted_trees_model_understanding_f7199848fc3467f053af4de68a7aeee860098bae3a9b9ae27d8d89628d6bda0c.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe>
出于性能方面的原因,当您的数据是内存数据集时,我们推荐您使用 `boosted_trees_classifier_train_in_memory` 函数。此外,如果您对训练时间没有要求抑或是您的数据集很大且不愿做分布式训练,请使用上面显示的 `tf.estimator.BoostedTrees` API。
当您使用此方法时请不要对数据分批batch而是对整个数据集进行操作。
```py
in_memory_params = dict(params)
in_memory_params['n_batches_per_layer'] = 1
# 在内存中的输入方程请不要对数据分批。
def make_inmemory_train_input_fn(X, y):
y = np.expand_dims(y, axis=1)
def input_fn():
return dict(X), y
return input_fn
train_input_fn = make_inmemory_train_input_fn(dftrain, y_train)
# 训练模型。
est = tf.estimator.BoostedTreesClassifier(
feature_columns,
train_in_memory=True,
**in_memory_params)
est.train(train_input_fn)
print(est.evaluate(eval_input_fn))
```
```py
INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpec8e696f
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpec8e696f', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
WARNING:tensorflow:Issue encountered when serializing resources.
Type is unsupported, or the types of the items don't match field type in CollectionDef. Note this is a warning and probably safe to ignore.
'_Resource' object has no attribute 'name'
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
WARNING:tensorflow:Issue encountered when serializing resources.
Type is unsupported, or the types of the items don't match field type in CollectionDef. Note this is a warning and probably safe to ignore.
'_Resource' object has no attribute 'name'
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpec8e696f/model.ckpt.
WARNING:tensorflow:Issue encountered when serializing resources.
Type is unsupported, or the types of the items don't match field type in CollectionDef. Note this is a warning and probably safe to ignore.
'_Resource' object has no attribute 'name'
INFO:tensorflow:loss = 0.6931472, step = 0
WARNING:tensorflow:It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0\. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
INFO:tensorflow:global_step/sec: 80.2732
INFO:tensorflow:loss = 0.34654337, step = 99 (1.249 sec)
INFO:tensorflow:Saving checkpoints for 153 into /tmp/tmpec8e696f/model.ckpt.
WARNING:tensorflow:Issue encountered when serializing resources.
Type is unsupported, or the types of the items don't match field type in CollectionDef. Note this is a warning and probably safe to ignore.
'_Resource' object has no attribute 'name'
INFO:tensorflow:Loss for final step: 0.31796658.
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-03-09T21:21:14Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpec8e696f/model.ckpt-153
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.55945s
INFO:tensorflow:Finished evaluation at 2020-03-09-21:21:15
INFO:tensorflow:Saving dict for global step 153: accuracy = 0.8030303, accuracy_baseline = 0.625, auc = 0.8679216, auc_precision_recall = 0.8527449, average_loss = 0.4203342, global_step = 153, label/mean = 0.375, loss = 0.4203342, precision = 0.7473684, prediction/mean = 0.38673538, recall = 0.7171717
WARNING:tensorflow:Issue encountered when serializing resources.
Type is unsupported, or the types of the items don't match field type in CollectionDef. Note this is a warning and probably safe to ignore.
'_Resource' object has no attribute 'name'
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 153: /tmp/tmpec8e696f/model.ckpt-153
{'accuracy': 0.8030303, 'accuracy_baseline': 0.625, 'auc': 0.8679216, 'auc_precision_recall': 0.8527449, 'average_loss': 0.4203342, 'label/mean': 0.375, 'loss': 0.4203342, 'precision': 0.7473684, 'prediction/mean': 0.38673538, 'recall': 0.7171717, 'global_step': 153}
```
## 模型说明与绘制
```py
import matplotlib.pyplot as plt
import seaborn as sns
sns_colors = sns.color_palette('colorblind')
```
## 局部可解释性Local interpretability
接下来您将输出定向特征贡献DFCs来解释单个预测。输出依据 [Palczewska et al](https://arxiv.org/pdf/1312.1121.pdf) 和 Saabas 在 [解释随机森林Interpreting Random Forests](http://blog.datadive.net/interpreting-random-forests/) 中提出的方法产生(scikit-learn 中随机森林相关的包 [`treeinterpreter`](https://github.com/andosa/treeinterpreter) 使用原理相同的远离). 使用以下语句输出 DFCs:
`pred_dicts = list(est.experimental_predict_with_explanations(pred_input_fn))`
(注意:带 “experimental” 前缀为实验版本(开发中),在正式版发布前可能对其修改。)
```py
pred_dicts = list(est.experimental_predict_with_explanations(eval_input_fn))
```
```py
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpec8e696f', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpec8e696f/model.ckpt-153
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
```
```py
# 创建 DFCs 的 DataFrame。
labels = y_eval.values
probs = pd.Series([pred['probabilities'][1] for pred in pred_dicts])
df_dfc = pd.DataFrame([pred['dfc'] for pred in pred_dicts])
df_dfc.describe().T
```
<devsite-iframe><iframe src="/tutorials/estimator/boosted_trees_model_understanding_71083831610c50d67070b696b0a841b1713e1e897dc039770c9e156ad1464b80.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe>
DFCs 有个不错的特性:贡献和 + 偏差bias = 给出样例的预测值。
```py
# DFCs 的和 + 偏差bias == 可能性
bias = pred_dicts[0]['bias']
dfc_prob = df_dfc.sum(axis=1) + bias
np.testing.assert_almost_equal(dfc_prob.values,
probs.values)
```
为单个乘客绘制 DFCs绘图时按贡献的方向性对其进行涂色并添加特征的值。
```py
# 绘制模版 :)
def _get_color(value):
"""正的 DFCs 标为绿色,负的为红色。"""
green, red = sns.color_palette()[2:4]
if value >= 0: return green
return red
def _add_feature_values(feature_values, ax):
"""在图的左侧显示特征的值"""
x_coord = ax.get_xlim()[0]
OFFSET = 0.15
for y_coord, (feat_name, feat_val) in enumerate(feature_values.items()):
t = plt.text(x_coord, y_coord - OFFSET, '{}'.format(feat_val), size=12)
t.set_bbox(dict(facecolor='white', alpha=0.5))
from matplotlib.font_manager import FontProperties
font = FontProperties()
font.set_weight('bold')
t = plt.text(x_coord, y_coord + 1 - OFFSET, 'feature\nvalue',
fontproperties=font, size=12)
def plot_example(example):
TOP_N = 8 # 显示前 8 个特征。
sorted_ix = example.abs().sort_values()[-TOP_N:].index # 按值排序。
example = example[sorted_ix]
colors = example.map(_get_color).tolist()
ax = example.to_frame().plot(kind='barh',
color=[colors],
legend=None,
alpha=0.75,
figsize=(10,6))
ax.grid(False, axis='y')
ax.set_yticklabels(ax.get_yticklabels(), size=14)
# 添加特征的值。
_add_feature_values(dfeval.iloc[ID][sorted_ix], ax)
return ax
```
```py
# 绘制结果。
ID = 182
example = df_dfc.iloc[ID] # 从评估集中选择第 i 个样例。
TOP_N = 8 # 显示前 8 个特征。
sorted_ix = example.abs().sort_values()[-TOP_N:].index
ax = plot_example(example)
ax.set_title('Feature contributions for example {}\n pred: {:1.2f}; label: {}'.format(ID, probs[ID], labels[ID]))
ax.set_xlabel('Contribution to predicted probability', size=14)
plt.show()
```
![png](img/982e1307bbc8145644b791d775fcc2c7.png)
更大的贡献值意味着对模型的预测有更大的影响。负的贡献表示此样例该特征的值减小了减小了模型的预测,正贡献值表示增加了模型的预测。
您也可以使用小提琴图violin plot来绘制该样例的 DFCs 并与整体分布比较。
```py
# 绘制代码模版。
def dist_violin_plot(df_dfc, ID):
# 初始化画布。
fig, ax = plt.subplots(1, 1, figsize=(10, 6))
# 创建样例 DataFrame。
TOP_N = 8 # 显示前 8 个特征。
example = df_dfc.iloc[ID]
ix = example.abs().sort_values()[-TOP_N:].index
example = example[ix]
example_df = example.to_frame(name='dfc')
# 添加整个分布的贡献。
parts=ax.violinplot([df_dfc[w] for w in ix],
vert=False,
showextrema=False,
widths=0.7,
positions=np.arange(len(ix)))
face_color = sns_colors[0]
alpha = 0.15
for pc in parts['bodies']:
pc.set_facecolor(face_color)
pc.set_alpha(alpha)
# 添加特征的值。
_add_feature_values(dfeval.iloc[ID][sorted_ix], ax)
# 添加局部贡献。
ax.scatter(example,
np.arange(example.shape[0]),
color=sns.color_palette()[2],
s=100,
marker="s",
label='contributions for example')
# 图例。
# 生成小提琴图的详细图例。
ax.plot([0,0], [1,1], label='eval set contributions\ndistributions',
color=face_color, alpha=alpha, linewidth=10)
legend = ax.legend(loc='lower right', shadow=True, fontsize='x-large',
frameon=True)
legend.get_frame().set_facecolor('white')
# 调整格式。
ax.set_yticks(np.arange(example.shape[0]))
ax.set_yticklabels(example.index)
ax.grid(False, axis='y')
ax.set_xlabel('Contribution to predicted probability', size=14)
```
绘制此样例。
```py
dist_violin_plot(df_dfc, ID)
plt.title('Feature contributions for example {}\n pred: {:1.2f}; label: {}'.format(ID, probs[ID], labels[ID]))
plt.show()
```
![png](img/c91d625a0312bd25acf8dab10ecb51ed.png)
最后,第三方的工具,如:[LIME](https://github.com/marcotcr/lime) 和 [shap](https://github.com/slundberg/shap) 也可以帮助理解模型的各个预测。
## 全局特征重要性Global feature importances
此外,您或许想了解模型这个整体而不是单个预测。接下来,您将计算并使用:
* 通过 `est.experimental_feature_importances` 得到基于增益的特征重要性Gain-based feature importances
* 排列特征重要性Permutation feature importances
* 使用 `est.experimental_predict_with_explanations` 得到总 DFCs。
基于增益的特征重要性在分离特定特征时测量损失的变化。而排列特征重要性是在评估集上通过每次打乱一个特征后观察模型性能的变化计算而出。
一般来说,排列特征重要性要优于基于增益的特征重要性,尽管这两种方法在潜在预测变量的测量范围或类别数量不确定时和特征相关联时不可信([来源](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-307))。 对不同种类特征重要性的更透彻概括和更翔实讨论请参考 [这篇文章](http://explained.ai/rf-importance/index.html) 。
### 基于增益的特征重要性Gain-based feature importances
TensorFlow 的提升树估算器estimator内置了函数 `est.experimental_feature_importances` 用于计算基于增益的特征重要性。
```py
importances = est.experimental_feature_importances(normalize=True)
df_imp = pd.Series(importances)
# 可视化重要性。
N = 8
ax = (df_imp.iloc[0:N][::-1]
.plot(kind='barh',
color=sns_colors[0],
title='Gain feature importances',
figsize=(10, 6)))
ax.grid(False, axis='y')
```
![png](img/11c5fe9ef9f8ed2389fe40e5fa1ccbb7.png)
### 平均绝对 DFCs
您还可以得到绝对 DFCs 的平均值来从全局的角度分析影响。
```py
# 绘图。
dfc_mean = df_dfc.abs().mean()
N = 8
sorted_ix = dfc_mean.abs().sort_values()[-N:].index # 求平均并按绝对值排序。
ax = dfc_mean[sorted_ix].plot(kind='barh',
color=sns_colors[1],
title='Mean |directional feature contributions|',
figsize=(10, 6))
ax.grid(False, axis='y')
```
![png](img/edb8cf06303c60cf812dce4865e8d331.png)
您可以看到 DFCs 如何随特征的值变化而变化。
```py
FEATURE = 'fare'
feature = pd.Series(df_dfc[FEATURE].values, index=dfeval[FEATURE].values).sort_index()
ax = sns.regplot(feature.index.values, feature.values, lowess=True)
ax.set_ylabel('contribution')
ax.set_xlabel(FEATURE)
ax.set_xlim(0, 100)
plt.show()
```
![png](img/dbd4a3a9bd5a14a61bcaf558a2231993.png)
### 排列特征重要性Permutation feature importances
```py
def permutation_importances(est, X_eval, y_eval, metric, features):
"""
分别对每列,打散列中的值并观察其对评估集的影响。
在训练过程中有一种类似的方法请参阅文章来源http://explained.ai/rf-importance/index.html
中有关 “Drop-column importance” 的部分。
"""
baseline = metric(est, X_eval, y_eval)
imp = []
for col in features:
save = X_eval[col].copy()
X_eval[col] = np.random.permutation(X_eval[col])
m = metric(est, X_eval, y_eval)
X_eval[col] = save
imp.append(baseline - m)
return np.array(imp)
def accuracy_metric(est, X, y):
"""TensorFlow 估算器精度"""
eval_input_fn = make_input_fn(X,
y=y,
shuffle=False,
n_epochs=1)
return est.evaluate(input_fn=eval_input_fn)['accuracy']
features = CATEGORICAL_COLUMNS + NUMERIC_COLUMNS
importances = permutation_importances(est, dfeval, y_eval, accuracy_metric,
features)
df_imp = pd.Series(importances, index=features)
sorted_ix = df_imp.abs().sort_values().index
ax = df_imp[sorted_ix][-5:].plot(kind='barh', color=sns_colors[2], figsize=(10, 6))
ax.grid(False, axis='y')
ax.set_title('Permutation feature importance')
plt.show()
```
```py
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-03-09T21:21:18Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpec8e696f/model.ckpt-153
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.56113s
INFO:tensorflow:Finished evaluation at 2020-03-09-21:21:18
INFO:tensorflow:Saving dict for global step 153: accuracy = 0.8030303, accuracy_baseline = 0.625, auc = 0.8679216, auc_precision_recall = 0.8527449, average_loss = 0.4203342, global_step = 153, label/mean = 0.375, loss = 0.4203342, precision = 0.7473684, prediction/mean = 0.38673538, recall = 0.7171717
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 153: /tmp/tmpec8e696f/model.ckpt-153
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-03-09T21:21:19Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpec8e696f/model.ckpt-153
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.57949s
INFO:tensorflow:Finished evaluation at 2020-03-09-21:21:19
INFO:tensorflow:Saving dict for global step 153: accuracy = 0.6060606, accuracy_baseline = 0.625, auc = 0.64355683, auc_precision_recall = 0.5400543, average_loss = 0.74337494, global_step = 153, label/mean = 0.375, loss = 0.74337494, precision = 0.47524753, prediction/mean = 0.39103043, recall = 0.4848485
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 153: /tmp/tmpec8e696f/model.ckpt-153
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-03-09T21:21:20Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpec8e696f/model.ckpt-153
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.58528s
INFO:tensorflow:Finished evaluation at 2020-03-09-21:21:21
INFO:tensorflow:Saving dict for global step 153: accuracy = 0.7916667, accuracy_baseline = 0.625, auc = 0.8624732, auc_precision_recall = 0.8392693, average_loss = 0.43363357, global_step = 153, label/mean = 0.375, loss = 0.43363357, precision = 0.7244898, prediction/mean = 0.38975066, recall = 0.7171717
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 153: /tmp/tmpec8e696f/model.ckpt-153
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-03-09T21:21:21Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpec8e696f/model.ckpt-153
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.55600s
INFO:tensorflow:Finished evaluation at 2020-03-09-21:21:22
INFO:tensorflow:Saving dict for global step 153: accuracy = 0.8068182, accuracy_baseline = 0.625, auc = 0.8674931, auc_precision_recall = 0.85280114, average_loss = 0.4206087, global_step = 153, label/mean = 0.375, loss = 0.4206087, precision = 0.75, prediction/mean = 0.38792592, recall = 0.72727275
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 153: /tmp/tmpec8e696f/model.ckpt-153
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-03-09T21:21:22Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpec8e696f/model.ckpt-153
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.54454s
INFO:tensorflow:Finished evaluation at 2020-03-09-21:21:23
INFO:tensorflow:Saving dict for global step 153: accuracy = 0.72727275, accuracy_baseline = 0.625, auc = 0.76737064, auc_precision_recall = 0.62659556, average_loss = 0.6019534, global_step = 153, label/mean = 0.375, loss = 0.6019534, precision = 0.6626506, prediction/mean = 0.3688063, recall = 0.5555556
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 153: /tmp/tmpec8e696f/model.ckpt-153
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-03-09T21:21:24Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpec8e696f/model.ckpt-153
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.53149s
INFO:tensorflow:Finished evaluation at 2020-03-09-21:21:24
INFO:tensorflow:Saving dict for global step 153: accuracy = 0.7878788, accuracy_baseline = 0.625, auc = 0.8389348, auc_precision_recall = 0.8278463, average_loss = 0.45054114, global_step = 153, label/mean = 0.375, loss = 0.45054114, precision = 0.7263158, prediction/mean = 0.3912348, recall = 0.6969697
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 153: /tmp/tmpec8e696f/model.ckpt-153
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-03-09T21:21:25Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpec8e696f/model.ckpt-153
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.54399s
INFO:tensorflow:Finished evaluation at 2020-03-09-21:21:25
INFO:tensorflow:Saving dict for global step 153: accuracy = 0.8030303, accuracy_baseline = 0.625, auc = 0.862565, auc_precision_recall = 0.84412414, average_loss = 0.42553493, global_step = 153, label/mean = 0.375, loss = 0.42553493, precision = 0.75268817, prediction/mean = 0.37500647, recall = 0.7070707
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 153: /tmp/tmpec8e696f/model.ckpt-153
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-03-09T21:21:26Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpec8e696f/model.ckpt-153
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.56776s
INFO:tensorflow:Finished evaluation at 2020-03-09-21:21:26
INFO:tensorflow:Saving dict for global step 153: accuracy = 0.8030303, accuracy_baseline = 0.625, auc = 0.8679216, auc_precision_recall = 0.8527449, average_loss = 0.4203342, global_step = 153, label/mean = 0.375, loss = 0.4203342, precision = 0.7473684, prediction/mean = 0.38673538, recall = 0.7171717
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 153: /tmp/tmpec8e696f/model.ckpt-153
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-03-09T21:21:27Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpec8e696f/model.ckpt-153
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.56329s
INFO:tensorflow:Finished evaluation at 2020-03-09-21:21:28
INFO:tensorflow:Saving dict for global step 153: accuracy = 0.79924244, accuracy_baseline = 0.625, auc = 0.8132232, auc_precision_recall = 0.7860318, average_loss = 0.4787808, global_step = 153, label/mean = 0.375, loss = 0.4787808, precision = 0.7613636, prediction/mean = 0.37704408, recall = 0.67676765
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 153: /tmp/tmpec8e696f/model.ckpt-153
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-03-09T21:21:28Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpec8e696f/model.ckpt-153
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.60489s
INFO:tensorflow:Finished evaluation at 2020-03-09-21:21:29
INFO:tensorflow:Saving dict for global step 153: accuracy = 0.8030303, accuracy_baseline = 0.625, auc = 0.8360882, auc_precision_recall = 0.7940172, average_loss = 0.45960733, global_step = 153, label/mean = 0.375, loss = 0.45960733, precision = 0.7473684, prediction/mean = 0.38010252, recall = 0.7171717
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 153: /tmp/tmpec8e696f/model.ckpt-153
```
![png](img/3b5e2e711798f7ff0d6ff949ea4f54f3.png)
## 可视化模型拟合过程
首先,使用以下公式构建训练数据:
$$z=x* e^{-x^2 - y^2}$$
其中, (z) 是您要试着预测的值(因变量),(x) 和 (y) 是特征。
```py
from numpy.random import uniform, seed
from scipy.interpolate import griddata
# 生成数据。
seed(0)
npts = 5000
x = uniform(-2, 2, npts)
y = uniform(-2, 2, npts)
z = x*np.exp(-x**2 - y**2)
xy = np.zeros((2,np.size(x)))
xy[0] = x
xy[1] = y
xy = xy.T
```
```py
# 准备用于训练的数据。
df = pd.DataFrame({'x': x, 'y': y, 'z': z})
xi = np.linspace(-2.0, 2.0, 200),
yi = np.linspace(-2.1, 2.1, 210),
xi,yi = np.meshgrid(xi, yi)
df_predict = pd.DataFrame({
'x' : xi.flatten(),
'y' : yi.flatten(),
})
predict_shape = xi.shape
```
```py
def plot_contour(x, y, z, **kwargs):
# 准备用于训练的数据。
plt.figure(figsize=(10, 8))
# 绘制等值线图,标出非均匀数据点。
CS = plt.contour(x, y, z, 15, linewidths=0.5, colors='k')
CS = plt.contourf(x, y, z, 15,
vmax=abs(zi).max(), vmin=-abs(zi).max(), cmap='RdBu_r')
plt.colorbar() # 绘制颜色图例。
# 绘制数据点。
plt.xlim(-2, 2)
plt.ylim(-2, 2)
```
您可以可视化这个方程,红色代表较大的值。
```py
zi = griddata(xy, z, (xi, yi), method='linear', fill_value='0')
plot_contour(xi, yi, zi)
plt.scatter(df.x, df.y, marker='.')
plt.title('Contour on training data')
plt.show()
```
![png](img/02b2fc97a46c88c22ee2d11e8c28bf0d.png)
```py
fc = [tf.feature_column.numeric_column('x'),
tf.feature_column.numeric_column('y')]
```
```py
def predict(est):
"""已有估算器给出的预测"""
predict_input_fn = lambda: tf.data.Dataset.from_tensors(dict(df_predict))
preds = np.array([p['predictions'][0] for p in est.predict(predict_input_fn)])
return preds.reshape(predict_shape)
```
首先,我们尝试用线性模型拟合数据。
```py
train_input_fn = make_input_fn(df, df.z)
est = tf.estimator.LinearRegressor(fc)
est.train(train_input_fn, max_steps=500);
```
```py
INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpd4fqobc9
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpd4fqobc9', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:From /tensorflow-2.1.0/python3.6/tensorflow_core/python/feature_column/feature_column_v2.py:518: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.add_weight` method instead.
WARNING:tensorflow:From /tensorflow-2.1.0/python3.6/tensorflow_core/python/keras/optimizer_v2/ftrl.py:143: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpd4fqobc9/model.ckpt.
INFO:tensorflow:loss = 0.023290718, step = 0
INFO:tensorflow:global_step/sec: 267.329
INFO:tensorflow:loss = 0.017512696, step = 100 (0.377 sec)
INFO:tensorflow:global_step/sec: 312.355
INFO:tensorflow:loss = 0.018098738, step = 200 (0.321 sec)
INFO:tensorflow:global_step/sec: 341.77
INFO:tensorflow:loss = 0.019927984, step = 300 (0.291 sec)
INFO:tensorflow:global_step/sec: 307.825
INFO:tensorflow:loss = 0.01797011, step = 400 (0.327 sec)
INFO:tensorflow:Saving checkpoints for 500 into /tmp/tmpd4fqobc9/model.ckpt.
INFO:tensorflow:Loss for final step: 0.019703189.
```
```py
plot_contour(xi, yi, predict(est))
```
```py
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:Layer linear/linear_model is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2\. The layer has dtype float32 because it's dtype defaults to floatx.
If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.
To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpd4fqobc9/model.ckpt-500
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
```
![png](img/2bc3a9da8c0e479bf906dd0c765549f4.png)
可见,拟合效果并不好。接下来,我们试着用 GBDT 模型拟合并了解模型是如何拟合方程的。
```py
n_trees = 37
est = tf.estimator.BoostedTreesRegressor(fc, n_batches_per_layer=1, n_trees=n_trees)
est.train(train_input_fn, max_steps=500)
clear_output()
plot_contour(xi, yi, predict(est))
plt.text(-1.8, 2.1, '# trees: {}'.format(n_trees), color='w', backgroundcolor='black', size=20)
plt.show()
```
```py
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmp3jae7fgc/model.ckpt-222
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
```
![png](img/60960a15d5ca50a1486f3c3f8c200635.png)
随着树的数量增加,模型的预测越来越接近真实方程。
![](img/cb18ad8212a0648018238babc8fe2325.png)
## 总结
本文介绍了如何使用定向特征贡献DFCs及几种特征重要性来解释提升树模型。这些方法可以帮助您了解特征是如何影响模型的预测。 最后您还可以通过观察其他模型的超平面decision surface并结合本文内容来学习提升树模型是如何拟合方程的。

View File

@@ -1,383 +0,0 @@
# 通过 Keras 模型创建 Estimator
> 原文:[https://tensorflow.google.cn/tutorials/estimator/keras_model_to_estimator](https://tensorflow.google.cn/tutorials/estimator/keras_model_to_estimator)
## 概述
TensorFlow 完全支持 TensorFlow Estimator可以从新的和现有的 [`tf.keras`](https://tensorflow.google.cn/api_docs/python/tf/keras) 模型创建 Estimator。本教程包含了该过程完整且最为简短的示例。
## 设置
```py
import tensorflow as tf
import numpy as np
import tensorflow_datasets as tfds
```
### 创建一个简单的 Keras 模型。
在 Keras 中,需要通过组装*层*来构建*模型*。模型(通常)是由层构成的计算图。最常见的模型类型是一种叠加层:[`tf.keras.Sequential`](https://tensorflow.google.cn/api_docs/python/tf/keras/Sequential) 模型。
构建一个简单的全连接网络(即多层感知器):
```py
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(16, activation='relu', input_shape=(4,)),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(3)
])
```
编译模型并获取摘要。
```py
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer='adam')
model.summary()
```
```py
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 16) 80
_________________________________________________________________
dropout (Dropout) (None, 16) 0
_________________________________________________________________
dense_1 (Dense) (None, 3) 51
=================================================================
Total params: 131
Trainable params: 131
Non-trainable params: 0
_________________________________________________________________
```
### 创建输入函数
使用 [Datasets API](https://tensorflow.google.cn/guide/data) 可以扩展到大型数据集或多设备训练。
Estimator 需要控制构建输入流水线的时间和方式。为此,它们需要一个“输入函数”或 `input_fn``Estimator` 将不使用任何参数调用此函数。`input_fn` 必须返回 [`tf.data.Dataset`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset)。
```py
def input_fn():
split = tfds.Split.TRAIN
dataset = tfds.load('iris', split=split, as_supervised=True)
dataset = dataset.map(lambda features, labels: ({'dense_input':features}, labels))
dataset = dataset.batch(32).repeat()
return dataset
```
测试您的 `input_fn`
```py
for features_batch, labels_batch in input_fn().take(1):
print(features_batch)
print(labels_batch)
```
```py
Downloading and preparing dataset iris/2.0.0 (download: 4.44 KiB, generated: Unknown size, total: 4.44 KiB) to /home/kbuilder/tensorflow_datasets/iris/2.0.0...
Shuffling and writing examples to /home/kbuilder/tensorflow_datasets/iris/2.0.0.incompleteQ29ZWS/iris-train.tfrecord
Dataset iris downloaded and prepared to /home/kbuilder/tensorflow_datasets/iris/2.0.0\. Subsequent calls will reuse this data.
{'dense_input': <tf.Tensor: shape=(32, 4), dtype=float32, numpy=
array([[5.1, 3.4, 1.5, 0.2],
[7.7, 3\. , 6.1, 2.3],
[5.7, 2.8, 4.5, 1.3],
[6.8, 3.2, 5.9, 2.3],
[5.2, 3.4, 1.4, 0.2],
[5.6, 2.9, 3.6, 1.3],
[5.5, 2.6, 4.4, 1.2],
[5.5, 2.4, 3.7, 1\. ],
[4.6, 3.4, 1.4, 0.3],
[7.7, 2.8, 6.7, 2\. ],
[7\. , 3.2, 4.7, 1.4],
[4.6, 3.2, 1.4, 0.2],
[6.5, 3\. , 5.2, 2\. ],
[5.5, 4.2, 1.4, 0.2],
[5.4, 3.9, 1.3, 0.4],
[5\. , 3.5, 1.3, 0.3],
[5.1, 3.8, 1.5, 0.3],
[4.8, 3\. , 1.4, 0.1],
[6.5, 3\. , 5.8, 2.2],
[7.6, 3\. , 6.6, 2.1],
[6.7, 3.3, 5.7, 2.1],
[7.9, 3.8, 6.4, 2\. ],
[6.7, 3\. , 5.2, 2.3],
[5.8, 4\. , 1.2, 0.2],
[6.3, 2.5, 5\. , 1.9],
[5\. , 3\. , 1.6, 0.2],
[6.9, 3.1, 5.1, 2.3],
[6.1, 3\. , 4.6, 1.4],
[5.8, 2.7, 4.1, 1\. ],
[5.2, 2.7, 3.9, 1.4],
[6.7, 3\. , 5\. , 1.7],
[5.7, 2.6, 3.5, 1\. ]], dtype=float32)>}
tf.Tensor([0 2 1 2 0 1 1 1 0 2 1 0 2 0 0 0 0 0 2 2 2 2 2 0 2 0 2 1 1 1 1 1], shape=(32,), dtype=int64)
```
### 通过 tf.keras 模型创建 Estimator。
可以使用 [`tf.estimator`](https://tensorflow.google.cn/api_docs/python/tf/estimator) API 来训练 [`tf.keras.Model`](https://tensorflow.google.cn/api_docs/python/tf/keras/Model),方法是使用 [`tf.keras.estimator.model_to_estimator`](https://tensorflow.google.cn/api_docs/python/tf/keras/estimator/model_to_estimator) 将模型转换为 [`tf.estimator.Estimator`](https://tensorflow.google.cn/api_docs/python/tf/estimator/Estimator) 对象。
```py
import tempfile
model_dir = tempfile.mkdtemp()
keras_estimator = tf.keras.estimator.model_to_estimator(
keras_model=model, model_dir=model_dir)
```
```py
INFO:tensorflow:Using default config.
INFO:tensorflow:Using default config.
INFO:tensorflow:Using the Keras model provided.
INFO:tensorflow:Using the Keras model provided.
Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/keras.py:220: set_learning_phase (from tensorflow.python.keras.backend) is deprecated and will be removed after 2020-10-11.
Instructions for updating:
Simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/keras.py:220: set_learning_phase (from tensorflow.python.keras.backend) is deprecated and will be removed after 2020-10-11.
Instructions for updating:
Simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp13998n2j', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp13998n2j', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
```
训练和评估 Estimator。
```py
keras_estimator.train(input_fn=input_fn, steps=500)
eval_result = keras_estimator.evaluate(input_fn=input_fn, steps=10)
print('Eval result: {}'.format(eval_result))
```
```py
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/tmp13998n2j/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})
INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/tmp13998n2j/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})
INFO:tensorflow:Warm-starting from: /tmp/tmp13998n2j/keras/keras_model.ckpt
INFO:tensorflow:Warm-starting from: /tmp/tmp13998n2j/keras/keras_model.ckpt
INFO:tensorflow:Warm-starting variables only in TRAINABLE_VARIABLES.
INFO:tensorflow:Warm-starting variables only in TRAINABLE_VARIABLES.
INFO:tensorflow:Warm-started 4 variables.
INFO:tensorflow:Warm-started 4 variables.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp13998n2j/model.ckpt.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp13998n2j/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 1.5731332, step = 0
INFO:tensorflow:loss = 1.5731332, step = 0
INFO:tensorflow:global_step/sec: 444.326
INFO:tensorflow:global_step/sec: 444.326
INFO:tensorflow:loss = 0.79164267, step = 100 (0.227 sec)
INFO:tensorflow:loss = 0.79164267, step = 100 (0.227 sec)
INFO:tensorflow:global_step/sec: 515.459
INFO:tensorflow:global_step/sec: 515.459
INFO:tensorflow:loss = 0.5765847, step = 200 (0.193 sec)
INFO:tensorflow:loss = 0.5765847, step = 200 (0.193 sec)
INFO:tensorflow:global_step/sec: 518.855
INFO:tensorflow:global_step/sec: 518.855
INFO:tensorflow:loss = 0.48571444, step = 300 (0.193 sec)
INFO:tensorflow:loss = 0.48571444, step = 300 (0.193 sec)
INFO:tensorflow:global_step/sec: 527.318
INFO:tensorflow:global_step/sec: 527.318
INFO:tensorflow:loss = 0.3836534, step = 400 (0.190 sec)
INFO:tensorflow:loss = 0.3836534, step = 400 (0.190 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 500...
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 500...
INFO:tensorflow:Saving checkpoints for 500 into /tmp/tmp13998n2j/model.ckpt.
INFO:tensorflow:Saving checkpoints for 500 into /tmp/tmp13998n2j/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 500...
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 500...
INFO:tensorflow:Loss for final step: 0.46023262.
INFO:tensorflow:Loss for final step: 0.46023262.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_v1.py:2048: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_v1.py:2048: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-09-22T19:57:20Z
INFO:tensorflow:Starting evaluation at 2020-09-22T19:57:20Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmp13998n2j/model.ckpt-500
INFO:tensorflow:Restoring parameters from /tmp/tmp13998n2j/model.ckpt-500
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [1/10]
INFO:tensorflow:Evaluation [1/10]
INFO:tensorflow:Evaluation [2/10]
INFO:tensorflow:Evaluation [2/10]
INFO:tensorflow:Evaluation [3/10]
INFO:tensorflow:Evaluation [3/10]
INFO:tensorflow:Evaluation [4/10]
INFO:tensorflow:Evaluation [4/10]
INFO:tensorflow:Evaluation [5/10]
INFO:tensorflow:Evaluation [5/10]
INFO:tensorflow:Evaluation [6/10]
INFO:tensorflow:Evaluation [6/10]
INFO:tensorflow:Evaluation [7/10]
INFO:tensorflow:Evaluation [7/10]
INFO:tensorflow:Evaluation [8/10]
INFO:tensorflow:Evaluation [8/10]
INFO:tensorflow:Evaluation [9/10]
INFO:tensorflow:Evaluation [9/10]
INFO:tensorflow:Evaluation [10/10]
INFO:tensorflow:Evaluation [10/10]
INFO:tensorflow:Inference Time : 0.16498s
INFO:tensorflow:Inference Time : 0.16498s
INFO:tensorflow:Finished evaluation at 2020-09-22-19:57:20
INFO:tensorflow:Finished evaluation at 2020-09-22-19:57:20
INFO:tensorflow:Saving dict for global step 500: global_step = 500, loss = 0.33660004
INFO:tensorflow:Saving dict for global step 500: global_step = 500, loss = 0.33660004
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 500: /tmp/tmp13998n2j/model.ckpt-500
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 500: /tmp/tmp13998n2j/model.ckpt-500
Eval result: {'loss': 0.33660004, 'global_step': 500}
```

View File

@@ -1 +0,0 @@
# 高级

View File

@@ -1 +0,0 @@
# 自定义

View File

@@ -1,230 +0,0 @@
# Customization basics: tensors and operations
> 原文:[https://tensorflow.google.cn/tutorials/customization/basics](https://tensorflow.google.cn/tutorials/customization/basics)
This is an introductory TensorFlow tutorial that shows how to:
* Import the required package
* Create and use tensors
* Use GPU acceleration
* Demonstrate [`tf.data.Dataset`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset)
## Import TensorFlow
To get started, import the `tensorflow` module. As of TensorFlow 2, eager execution is turned on by default. This enables a more interactive frontend to TensorFlow, the details of which we will discuss much later.
```py
import tensorflow as tf
```
## Tensors
A Tensor is a multi-dimensional array. Similar to NumPy `ndarray` objects, [`tf.Tensor`](https://tensorflow.google.cn/api_docs/python/tf/Tensor) objects have a data type and a shape. Additionally, [`tf.Tensor`](https://tensorflow.google.cn/api_docs/python/tf/Tensor)s can reside in accelerator memory (like a GPU). TensorFlow offers a rich library of operations ([tf.add](https://tensorflow.google.cn/api_docs/python/tf/add), [tf.matmul](https://tensorflow.google.cn/api_docs/python/tf/matmul), [tf.linalg.inv](https://tensorflow.google.cn/api_docs/python/tf/linalg/inv) etc.) that consume and produce [`tf.Tensor`](https://tensorflow.google.cn/api_docs/python/tf/Tensor)s. These operations automatically convert native Python types, for example:
```py
print(tf.add(1, 2))
print(tf.add([1, 2], [3, 4]))
print(tf.square(5))
print(tf.reduce_sum([1, 2, 3]))
# Operator overloading is also supported
print(tf.square(2) + tf.square(3))
```
```py
tf.Tensor(3, shape=(), dtype=int32)
tf.Tensor([4 6], shape=(2,), dtype=int32)
tf.Tensor(25, shape=(), dtype=int32)
tf.Tensor(6, shape=(), dtype=int32)
tf.Tensor(13, shape=(), dtype=int32)
```
Each [`tf.Tensor`](https://tensorflow.google.cn/api_docs/python/tf/Tensor) has a shape and a datatype:
```py
x = tf.matmul([[1]], [[2, 3]])
print(x)
print(x.shape)
print(x.dtype)
```
```py
tf.Tensor([[2 3]], shape=(1, 2), dtype=int32)
(1, 2)
<dtype: 'int32'>
```
The most obvious differences between NumPy arrays and [`tf.Tensor`](https://tensorflow.google.cn/api_docs/python/tf/Tensor)s are:
1. Tensors can be backed by accelerator memory (like GPU, TPU).
2. Tensors are immutable.
### NumPy Compatibility
Converting between a TensorFlow [`tf.Tensor`](https://tensorflow.google.cn/api_docs/python/tf/Tensor)s and a NumPy `ndarray` is easy:
* TensorFlow operations automatically convert NumPy ndarrays to Tensors.
* NumPy operations automatically convert Tensors to NumPy ndarrays.
Tensors are explicitly converted to NumPy ndarrays using their `.numpy()` method. These conversions are typically cheap since the array and [`tf.Tensor`](https://tensorflow.google.cn/api_docs/python/tf/Tensor) share the underlying memory representation, if possible. However, sharing the underlying representation isn't always possible since the [`tf.Tensor`](https://tensorflow.google.cn/api_docs/python/tf/Tensor) may be hosted in GPU memory while NumPy arrays are always backed by host memory, and the conversion involves a copy from GPU to host memory.
```py
import numpy as np
ndarray = np.ones([3, 3])
print("TensorFlow operations convert numpy arrays to Tensors automatically")
tensor = tf.multiply(ndarray, 42)
print(tensor)
print("And NumPy operations convert Tensors to numpy arrays automatically")
print(np.add(tensor, 1))
print("The .numpy() method explicitly converts a Tensor to a numpy array")
print(tensor.numpy())
```
```py
TensorFlow operations convert numpy arrays to Tensors automatically
tf.Tensor(
[[42\. 42\. 42.]
[42\. 42\. 42.]
[42\. 42\. 42.]], shape=(3, 3), dtype=float64)
And NumPy operations convert Tensors to numpy arrays automatically
[[43\. 43\. 43.]
[43\. 43\. 43.]
[43\. 43\. 43.]]
The .numpy() method explicitly converts a Tensor to a numpy array
[[42\. 42\. 42.]
[42\. 42\. 42.]
[42\. 42\. 42.]]
```
## GPU acceleration
Many TensorFlow operations are accelerated using the GPU for computation. Without any annotations, TensorFlow automatically decides whether to use the GPU or CPU for an operation—copying the tensor between CPU and GPU memory, if necessary. Tensors produced by an operation are typically backed by the memory of the device on which the operation executed, for example:
```py
x = tf.random.uniform([3, 3])
print("Is there a GPU available: "),
print(tf.config.experimental.list_physical_devices("GPU"))
print("Is the Tensor on GPU #0: "),
print(x.device.endswith('GPU:0'))
```
```py
Is there a GPU available:
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Is the Tensor on GPU #0:
True
```
### Device Names
The [`Tensor.device`](https://tensorflow.google.cn/api_docs/python/tf/Tensor#device) property provides a fully qualified string name of the device hosting the contents of the tensor. This name encodes many details, such as an identifier of the network address of the host on which this program is executing and the device within that host. This is required for distributed execution of a TensorFlow program. The string ends with `GPU:<N>` if the tensor is placed on the `N`-th GPU on the host.
### Explicit Device Placement
In TensorFlow, *placement* refers to how individual operations are assigned (placed on) a device for execution. As mentioned, when there is no explicit guidance provided, TensorFlow automatically decides which device to execute an operation and copies tensors to that device, if needed. However, TensorFlow operations can be explicitly placed on specific devices using the [`tf.device`](https://tensorflow.google.cn/api_docs/python/tf/device) context manager, for example:
```py
import time
def time_matmul(x):
start = time.time()
for loop in range(10):
tf.matmul(x, x)
result = time.time()-start
print("10 loops: {:0.2f}ms".format(1000*result))
# Force execution on CPU
print("On CPU:")
with tf.device("CPU:0"):
x = tf.random.uniform([1000, 1000])
assert x.device.endswith("CPU:0")
time_matmul(x)
# Force execution on GPU #0 if available
if tf.config.experimental.list_physical_devices("GPU"):
print("On GPU:")
with tf.device("GPU:0"): # Or GPU:1 for the 2nd GPU, GPU:2 for the 3rd etc.
x = tf.random.uniform([1000, 1000])
assert x.device.endswith("GPU:0")
time_matmul(x)
```
```py
On CPU:
10 loops: 102.06ms
On GPU:
10 loops: 231.87ms
```
## Datasets
This section uses the [`tf.data.Dataset` API](https://tensorflow.google.cn/guide/datasets) to build a pipeline for feeding data to your model. The [`tf.data.Dataset`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset) API is used to build performant, complex input pipelines from simple, re-usable pieces that will feed your model's training or evaluation loops.
### Create a source `Dataset`
Create a *source* dataset using one of the factory functions like [`Dataset.from_tensors`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#from_tensors), [`Dataset.from_tensor_slices`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#from_tensor_slices), or using objects that read from files like [`TextLineDataset`](https://tensorflow.google.cn/api_docs/python/tf/data/TextLineDataset) or [`TFRecordDataset`](https://tensorflow.google.cn/api_docs/python/tf/data/TFRecordDataset). See the [TensorFlow Dataset guide](https://tensorflow.google.cn/guide/datasets#reading_input_data) for more information.
```py
ds_tensors = tf.data.Dataset.from_tensor_slices([1, 2, 3, 4, 5, 6])
# Create a CSV file
import tempfile
_, filename = tempfile.mkstemp()
with open(filename, 'w') as f:
f.write("""Line 1
Line 2
Line 3
""")
ds_file = tf.data.TextLineDataset(filename)
```
### Apply transformations
Use the transformations functions like [`map`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#map), [`batch`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#batch), and [`shuffle`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#shuffle) to apply transformations to dataset records.
```py
ds_tensors = ds_tensors.map(tf.square).shuffle(2).batch(2)
ds_file = ds_file.batch(2)
```
### Iterate
[`tf.data.Dataset`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset) objects support iteration to loop over records:
```py
print('Elements of ds_tensors:')
for x in ds_tensors:
print(x)
print('\nElements in ds_file:')
for x in ds_file:
print(x)
```
```py
Elements of ds_tensors:
tf.Tensor([1 4], shape=(2,), dtype=int32)
tf.Tensor([16 9], shape=(2,), dtype=int32)
tf.Tensor([25 36], shape=(2,), dtype=int32)
Elements in ds_file:
tf.Tensor([b'Line 1' b'Line 2'], shape=(2,), dtype=string)
tf.Tensor([b'Line 3' b' '], shape=(2,), dtype=string)
```

View File

@@ -1,308 +0,0 @@
# Custom layers
> 原文:[https://tensorflow.google.cn/tutorials/customization/custom_layers](https://tensorflow.google.cn/tutorials/customization/custom_layers)
We recommend using [`tf.keras`](https://tensorflow.google.cn/api_docs/python/tf/keras) as a high-level API for building neural networks. That said, most TensorFlow APIs are usable with eager execution.
```py
import tensorflow as tf
```
```py
print(tf.test.is_gpu_available())
```
```py
WARNING:tensorflow:From <ipython-input-3-ae932be897c3>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
True
```
## Layers: common sets of useful operations
Most of the time when writing code for machine learning models you want to operate at a higher level of abstraction than individual operations and manipulation of individual variables.
Many machine learning models are expressible as the composition and stacking of relatively simple layers, and TensorFlow provides both a set of many common layers as a well as easy ways for you to write your own application-specific layers either from scratch or as the composition of existing layers.
TensorFlow includes the full [Keras](https://keras.io) API in the tf.keras package, and the Keras layers are very useful when building your own models.
```py
# In the tf.keras.layers package, layers are objects. To construct a layer,
# simply construct the object. Most layers take as a first argument the number
# of output dimensions / channels.
layer = tf.keras.layers.Dense(100)
# The number of input dimensions is often unnecessary, as it can be inferred
# the first time the layer is used, but it can be provided if you want to
# specify it manually, which is useful in some complex models.
layer = tf.keras.layers.Dense(10, input_shape=(None, 5))
```
The full list of pre-existing layers can be seen in [the documentation](https://tensorflow.google.cn/api_docs/python/tf/keras/layers). It includes Dense (a fully-connected layer), Conv2D, LSTM, BatchNormalization, Dropout, and many others.
```py
# To use a layer, simply call it.
layer(tf.zeros([10, 5]))
```
```py
<tf.Tensor: shape=(10, 10), dtype=float32, numpy=
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)>
```
```py
# Layers have many useful methods. For example, you can inspect all variables
# in a layer using `layer.variables` and trainable variables using
# `layer.trainable_variables`. In this case a fully-connected layer
# will have variables for weights and biases.
layer.variables
```
```py
[<tf.Variable 'dense_1/kernel:0' shape=(5, 10) dtype=float32, numpy=
array([[-0.15722859, 0.57974607, -0.6042197 , -0.04509938, -0.34154978,
0.5545538 , -0.05465943, 0.41898602, 0.01103759, 0.3038023 ],
[ 0.02127045, -0.5874406 , -0.46126658, 0.44600803, 0.25224942,
-0.24498063, 0.16537589, -0.2237429 , -0.4222283 , -0.29941237],
[ 0.30734265, 0.6019073 , -0.4399919 , -0.35211664, -0.02590752,
-0.34433138, 0.26751322, 0.00731838, -0.04928106, -0.5188436 ],
[ 0.25729483, -0.15926728, -0.03268623, 0.36698097, -0.45867646,
0.02833885, -0.49959266, 0.09508026, -0.01607442, -0.10307193],
[ 0.33573806, 0.45685798, 0.21133131, 0.4112534 , 0.51482946,
0.5442372 , 0.21336573, 0.57636994, -0.40508842, 0.15163761]],
dtype=float32)>,
<tf.Variable 'dense_1/bias:0' shape=(10,) dtype=float32, numpy=array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)>]
```
```py
# The variables are also accessible through nice accessors
layer.kernel, layer.bias
```
```py
(<tf.Variable 'dense_1/kernel:0' shape=(5, 10) dtype=float32, numpy=
array([[-0.15722859, 0.57974607, -0.6042197 , -0.04509938, -0.34154978,
0.5545538 , -0.05465943, 0.41898602, 0.01103759, 0.3038023 ],
[ 0.02127045, -0.5874406 , -0.46126658, 0.44600803, 0.25224942,
-0.24498063, 0.16537589, -0.2237429 , -0.4222283 , -0.29941237],
[ 0.30734265, 0.6019073 , -0.4399919 , -0.35211664, -0.02590752,
-0.34433138, 0.26751322, 0.00731838, -0.04928106, -0.5188436 ],
[ 0.25729483, -0.15926728, -0.03268623, 0.36698097, -0.45867646,
0.02833885, -0.49959266, 0.09508026, -0.01607442, -0.10307193],
[ 0.33573806, 0.45685798, 0.21133131, 0.4112534 , 0.51482946,
0.5442372 , 0.21336573, 0.57636994, -0.40508842, 0.15163761]],
dtype=float32)>,
<tf.Variable 'dense_1/bias:0' shape=(10,) dtype=float32, numpy=array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)>)
```
## Implementing custom layers
The best way to implement your own layer is extending the tf.keras.Layer class and implementing:
1. `__init__` , where you can do all input-independent initialization
2. `build`, where you know the shapes of the input tensors and can do the rest of the initialization
3. `call`, where you do the forward computation
Note that you don't have to wait until `build` is called to create your variables, you can also create them in `__init__`. However, the advantage of creating them in `build` is that it enables late variable creation based on the shape of the inputs the layer will operate on. On the other hand, creating variables in `__init__` would mean that shapes required to create the variables will need to be explicitly specified.
```py
class MyDenseLayer(tf.keras.layers.Layer):
def __init__(self, num_outputs):
super(MyDenseLayer, self).__init__()
self.num_outputs = num_outputs
def build(self, input_shape):
self.kernel = self.add_weight("kernel",
shape=[int(input_shape[-1]),
self.num_outputs])
def call(self, input):
return tf.matmul(input, self.kernel)
layer = MyDenseLayer(10)
```
```py
_ = layer(tf.zeros([10, 5])) # Calling the layer `.builds` it.
```
```py
print([var.name for var in layer.trainable_variables])
```
```py
['my_dense_layer/kernel:0']
```
Overall code is easier to read and maintain if it uses standard layers whenever possible, as other readers will be familiar with the behavior of standard layers. If you want to use a layer which is not present in [`tf.keras.layers`](https://tensorflow.google.cn/api_docs/python/tf/keras/layers), consider filing a [github issue](http://github.com/tensorflow/tensorflow/issues/new) or, even better, sending us a pull request!
## Models: Composing layers
Many interesting layer-like things in machine learning models are implemented by composing existing layers. For example, each residual block in a resnet is a composition of convolutions, batch normalizations, and a shortcut. Layers can be nested inside other layers.
Typically you inherit from [`keras.Model`](https://tensorflow.google.cn/api_docs/python/tf/keras/Model) when you need the model methods like: [`Model.fit`](https://tensorflow.google.cn/api_docs/python/tf/keras/Model#fit),[`Model.evaluate`](https://tensorflow.google.cn/api_docs/python/tf/keras/Model#evaluate), and [`Model.save`](https://tensorflow.google.cn/api_docs/python/tf/keras/Model#save) (see [Custom Keras layers and models](https://tensorflow.google.cn/guide/keras/custom_layers_and_models) for details).
One other feature provided by [`keras.Model`](https://tensorflow.google.cn/api_docs/python/tf/keras/Model) (instead of [`keras.layers.Layer`](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/Layer)) is that in addition to tracking variables, a [`keras.Model`](https://tensorflow.google.cn/api_docs/python/tf/keras/Model) also tracks its internal layers, making them easier to inspect.
For example here is a ResNet block:
```py
class ResnetIdentityBlock(tf.keras.Model):
def __init__(self, kernel_size, filters):
super(ResnetIdentityBlock, self).__init__(name='')
filters1, filters2, filters3 = filters
self.conv2a = tf.keras.layers.Conv2D(filters1, (1, 1))
self.bn2a = tf.keras.layers.BatchNormalization()
self.conv2b = tf.keras.layers.Conv2D(filters2, kernel_size, padding='same')
self.bn2b = tf.keras.layers.BatchNormalization()
self.conv2c = tf.keras.layers.Conv2D(filters3, (1, 1))
self.bn2c = tf.keras.layers.BatchNormalization()
def call(self, input_tensor, training=False):
x = self.conv2a(input_tensor)
x = self.bn2a(x, training=training)
x = tf.nn.relu(x)
x = self.conv2b(x)
x = self.bn2b(x, training=training)
x = tf.nn.relu(x)
x = self.conv2c(x)
x = self.bn2c(x, training=training)
x += input_tensor
return tf.nn.relu(x)
block = ResnetIdentityBlock(1, [1, 2, 3])
```
```py
_ = block(tf.zeros([1, 2, 3, 3]))
```
```py
block.layers
```
```py
[<tensorflow.python.keras.layers.convolutional.Conv2D at 0x7f98d15a9c18>,
<tensorflow.python.keras.layers.normalization_v2.BatchNormalization at 0x7f99303a1e80>,
<tensorflow.python.keras.layers.convolutional.Conv2D at 0x7f98d15a7b00>,
<tensorflow.python.keras.layers.normalization_v2.BatchNormalization at 0x7f98d15a7860>,
<tensorflow.python.keras.layers.convolutional.Conv2D at 0x7f98d15a7630>,
<tensorflow.python.keras.layers.normalization_v2.BatchNormalization at 0x7f98d15a7390>]
```
```py
len(block.variables)
```
```py
18
```
```py
block.summary()
```
```py
Model: "resnet_identity_block"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) multiple 4
_________________________________________________________________
batch_normalization (BatchNo multiple 4
_________________________________________________________________
conv2d_1 (Conv2D) multiple 4
_________________________________________________________________
batch_normalization_1 (Batch multiple 8
_________________________________________________________________
conv2d_2 (Conv2D) multiple 9
_________________________________________________________________
batch_normalization_2 (Batch multiple 12
=================================================================
Total params: 41
Trainable params: 29
Non-trainable params: 12
_________________________________________________________________
```
Much of the time, however, models which compose many layers simply call one layer after the other. This can be done in very little code using [`tf.keras.Sequential`](https://tensorflow.google.cn/api_docs/python/tf/keras/Sequential):
```py
my_seq = tf.keras.Sequential([tf.keras.layers.Conv2D(1, (1, 1),
input_shape=(
None, None, 3)),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Conv2D(2, 1,
padding='same'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Conv2D(3, (1, 1)),
tf.keras.layers.BatchNormalization()])
my_seq(tf.zeros([1, 2, 3, 3]))
```
```py
<tf.Tensor: shape=(1, 2, 3, 3), dtype=float32, numpy=
array([[[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],
[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]]]], dtype=float32)>
```
```py
my_seq.summary()
```
```py
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_3 (Conv2D) (None, None, None, 1) 4
_________________________________________________________________
batch_normalization_3 (Batch (None, None, None, 1) 4
_________________________________________________________________
conv2d_4 (Conv2D) (None, None, None, 2) 4
_________________________________________________________________
batch_normalization_4 (Batch (None, None, None, 2) 8
_________________________________________________________________
conv2d_5 (Conv2D) (None, None, None, 3) 9
_________________________________________________________________
batch_normalization_5 (Batch (None, None, None, 3) 12
=================================================================
Total params: 41
Trainable params: 29
Non-trainable params: 12
_________________________________________________________________
```
# Next steps
Now you can go back to the previous notebook and adapt the linear regression example to use layers and models to be better structured.

View File

@@ -1,626 +0,0 @@
# 自定义训练: 演示
> 原文:[https://tensorflow.google.cn/tutorials/customization/custom_training_walkthrough](https://tensorflow.google.cn/tutorials/customization/custom_training_walkthrough)
这个教程将利用机器学习的手段来对鸢尾花按照物种进行分类。本教程将利用 TensorFlow 来进行以下操作:
1. 构建一个模型,
2. 用样例数据集对模型进行训练,以及
3. 利用该模型对未知数据进行预测。
## TensorFlow 编程
本指南采用了以下高级 TensorFlow 概念:
* 使用 TensorFlow 默认的 [eager execution](https://tensorflow.google.cn/guide/eager) 开发环境,
* 使用 [Datasets API](https://tensorflow.google.cn/guide/datasets) 导入数据,
* 使用 TensorFlow 的 [Keras API](https://keras.io/getting-started/sequential-model-guide/) 来构建各层以及整个模型。
本教程的结构同很多 TensorFlow 程序相似:
1. 数据集的导入与解析
2. 选择模型类型
3. 对模型进行训练
4. 评估模型效果
5. 使用训练过的模型进行预测
## 环境的搭建
### 配置导入
导入 TensorFlow 以及其他需要的 Python 库。 默认情况下TensorFlow 用 [eager execution](https://tensorflow.google.cn/guide/eager) 来实时评估操作, 返回具体值而不是建立一个稍后执行的[计算图](https://tensorflow.google.cn/guide/graphs)。 如果您习惯使用 REPL 或 python 交互控制台, 对此您会感觉得心应手。
```py
import os
import matplotlib.pyplot as plt
```
```py
import tensorflow as tf
```
```py
print("TensorFlow version: {}".format(tf.__version__))
print("Eager execution: {}".format(tf.executing_eagerly()))
```
```py
TensorFlow version: 2.3.0
Eager execution: True
```
## 鸢尾花分类问题
想象一下,您是一名植物学家,正在寻找一种能够对所发现的每株鸢尾花进行自动归类的方法。机器学习可提供多种从统计学上分类花卉的算法。例如,一个复杂的机器学习程序可以根据照片对花卉进行分类。我们的要求并不高 - 我们将根据鸢尾花花萼和花瓣的长度和宽度对其进行分类。
鸢尾属约有 300 个品种,但我们的程序将仅对下列三个品种进行分类:
* 山鸢尾
* 维吉尼亚鸢尾
* 变色鸢尾
| ![Petal geometry compared for three iris species: Iris setosa, Iris virginica, and Iris versicolor](img/bb63d10882d3aa9a631d3cf50ff7f21e.png) |
| **Figure 1.** [山鸢尾](https://commons.wikimedia.org/w/index.php?curid=170298) (by [Radomil](https://commons.wikimedia.org/wiki/User:Radomil), CC BY-SA 3.0), [变色鸢尾](https://commons.wikimedia.org/w/index.php?curid=248095), (by [Dlanglois](https://commons.wikimedia.org/wiki/User:Dlanglois), CC BY-SA 3.0), and [维吉尼亚鸢尾](https://www.flickr.com/photos/33397993@N05/3352169862) (by [Frank Mayfield](https://www.flickr.com/photos/33397993@N05), CC BY-SA 2.0).
  |
幸运的是,有人已经创建了一个包含有花萼和花瓣的测量值的[120 株鸢尾花的数据集](https://en.wikipedia.org/wiki/Iris_flower_data_set)。这是一个在入门级机器学习分类问题中经常使用的经典数据集。
## 导入和解析训练数据集
下载数据集文件并将其转换为可供此 Python 程序使用的结构。
### 下载数据集
使用 [tf.keras.utils.get_file](https://tensorflow.google.cn/api_docs/python/tf/keras/utils/get_file) 函数下载训练数据集文件。该函数会返回下载文件的文件路径:
```py
train_dataset_url = "https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv"
train_dataset_fp = tf.keras.utils.get_file(fname=os.path.basename(train_dataset_url),
origin=train_dataset_url)
print("Local copy of the dataset file: {}".format(train_dataset_fp))
```
```py
Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv
8192/2194 [================================================================================================================] - 0s 0us/step
Local copy of the dataset file: /home/kbuilder/.keras/datasets/iris_training.csv
```
### 检查数据
数据集 `iris_training.csv` 是一个纯文本文件,其中存储了逗号分隔值 (CSV) 格式的表格式数据.请使用 `head -n5` 命令查看前 5 个条目:
```py
head -n5 {train_dataset_fp}
```
```py
120,4,setosa,versicolor,virginica
6.4,2.8,5.6,2.2,2
5.0,2.3,3.3,1.0,1
4.9,2.5,4.5,1.7,2
4.9,3.1,1.5,0.1,0
```
我们可以从该数据集视图中注意到以下信息:
1. 第一行是表头,其中包含数据集信息:
* 共有 120 个样本。每个样本都有四个特征和一个标签名称,标签名称有三种可能。
* 后面的行是数据记录,每个[样本](https://developers.google.cn/machine-learning/glossary/#example)各占一行,其中:
* 前四个字段是[特征](https://developers.google.cn/machine-learning/glossary/#feature): 这四个字段代表的是样本的特点。在此数据集中,这些字段存储的是代表花卉测量值的浮点数。
* 最后一列是[标签](https://developers.google.cn/machine-learning/glossary/#label):即我们想要预测的值。对于此数据集,该值为 0、1 或 2 中的某个整数值(每个值分别对应一个花卉名称)。
我们用代码表示出来:
```py
# CSV 文件中列的顺序
column_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
feature_names = column_names[:-1]
label_name = column_names[-1]
print("Features: {}".format(feature_names))
print("Label: {}".format(label_name))
```
```py
Features: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
Label: species
```
每个标签都分别与一个字符串名称(例如 “setosa” )相关联,但机器学习通常依赖于数字值。标签编号会映射到一个指定的表示法,例如:
* `0` : 山鸢尾
* `1` : 变色鸢尾
* `2` : 维吉尼亚鸢尾
如需详细了解特征和标签,请参阅 [《机器学习速成课程》的“机器学习术语”部分](https://developers.google.cn/machine-learning/crash-course/framing/ml-terminology).
```py
class_names = ['Iris setosa', 'Iris versicolor', 'Iris virginica']
```
### 创建一个 [`tf.data.Dataset`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset)
TensorFlow 的 [Dataset API](https://tensorflow.google.cn/guide/datasets) 可处理在向模型加载数据时遇到的许多常见情况。这是一种高阶 API ,用于读取数据并将其转换为可供训练使用的格式。如需了解详情,请参阅[数据集快速入门指南](https://tensorflow.google.cn/get_started/datasets_quickstart)
由于数据集是 CSV 格式的文本文件,请使用 [make_csv_dataset](https://tensorflow.google.cn/api_docs/python/tf/data/experimental/make_csv_dataset) 函数将数据解析为合适的格式。由于此函数为训练模型生成数据,默认行为是对数据进行随机处理 `shuffle=True, shuffle_buffer_size=10000`),并且无限期重复数据集(`num_epochs=None`)。 我们还设置了 [batch_size](https://developers.google.cn/machine-learning/glossary/#batch_size) 参数:
```py
batch_size = 32
train_dataset = tf.data.experimental.make_csv_dataset(
train_dataset_fp,
batch_size,
column_names=column_names,
label_name=label_name,
num_epochs=1)
```
`make_csv_dataset` 返回一个`(features, label)` 对构建的 [`tf.data.Dataset`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset) ,其中 `features` 是一个字典: `{'feature_name': value}`
这些 `Dataset` 对象是可迭代的。 我们来看看下面的一些特征:
```py
features, labels = next(iter(train_dataset))
print(features)
```
```py
OrderedDict([('sepal_length', <tf.Tensor: shape=(32,), dtype=float32, numpy=
array([6.6, 5.8, 5\. , 7.7, 4.6, 4.7, 5.5, 6.1, 6.5, 6.1, 5\. , 6.4, 5.4,
6\. , 5.5, 7.2, 5.9, 6.4, 5\. , 5.2, 5\. , 6.4, 6.2, 5.1, 6.4, 5.8,
5.1, 6.3, 6.5, 4.9, 7.4, 5.7], dtype=float32)>), ('sepal_width', <tf.Tensor: shape=(32,), dtype=float32, numpy=
array([2.9, 2.7, 3.4, 2.6, 3.1, 3.2, 2.4, 2.9, 3\. , 2.6, 3.5, 3.1, 3.9,
3\. , 2.4, 3.6, 3.2, 3.2, 3.2, 3.5, 2.3, 2.7, 3.4, 3.8, 2.8, 2.6,
2.5, 3.3, 3\. , 3.1, 2.8, 3.8], dtype=float32)>), ('petal_length', <tf.Tensor: shape=(32,), dtype=float32, numpy=
array([4.6, 4.1, 1.5, 6.9, 1.5, 1.6, 3.8, 4.7, 5.5, 5.6, 1.6, 5.5, 1.7,
4.8, 3.7, 6.1, 4.8, 4.5, 1.2, 1.5, 3.3, 5.3, 5.4, 1.9, 5.6, 4\. ,
3\. , 6\. , 5.8, 1.5, 6.1, 1.7], dtype=float32)>), ('petal_width', <tf.Tensor: shape=(32,), dtype=float32, numpy=
array([1.3, 1\. , 0.2, 2.3, 0.2, 0.2, 1.1, 1.4, 1.8, 1.4, 0.6, 1.8, 0.4,
1.8, 1\. , 2.5, 1.8, 1.5, 0.2, 0.2, 1\. , 1.9, 2.3, 0.4, 2.2, 1.2,
1.1, 2.5, 2.2, 0.1, 1.9, 0.3], dtype=float32)>)])
```
注意到具有相似特征的样本会归为一组,即分为一批。更改 `batch_size` 可以设置存储在这些特征数组中的样本数。
绘制该批次中的几个特征后,就会开始看到一些集群现象:
```py
plt.scatter(features['petal_length'],
features['sepal_length'],
c=labels,
cmap='viridis')
plt.xlabel("Petal length")
plt.ylabel("Sepal length")
plt.show()
```
![png](img/6396c35912fab965e30d9adf6c7c8981.png)
要简化模型构建步骤,请创建一个函数以将特征字典重新打包为形状为 `(batch_size, num_features)` 的单个数组。
此函数使用 [tf.stack](https://tensorflow.google.cn/api_docs/python/tf/stack) 方法,该方法从张量列表中获取值,并创建指定维度的组合张量:
```py
def pack_features_vector(features, labels):
"""将特征打包到一个数组中"""
features = tf.stack(list(features.values()), axis=1)
return features, labels
```
然后使用 [tf.data.Dataset.map](https://tensorflow.google.cn/api_docs/python/tf/data/dataset/map) 方法将每个 `(features,label)` 对中的 `features` 打包到训练数据集中:
```py
train_dataset = train_dataset.map(pack_features_vector)
```
`Dataset` 的特征元素被构成了形如 `(batch_size, num_features)` 的数组。我们来看看前几个样本:
```py
features, labels = next(iter(train_dataset))
print(features[:5])
```
```py
tf.Tensor(
[[5\. 3.5 1.3 0.3]
[4.8 3.1 1.6 0.2]
[6.3 2.7 4.9 1.8]
[7.4 2.8 6.1 1.9]
[5\. 3.2 1.2 0.2]], shape=(5, 4), dtype=float32)
```
## 选择模型类型
### 为何要使用模型?
[模型](https://developers.google.cn/machine-learning/crash-course/glossary#model)是指特征与标签之间的关系。对于鸢尾花分类问题,模型定义了花萼和花瓣测量值与预测的鸢尾花品种之间的关系。一些简单的模型可以用几行代数进行描述,但复杂的机器学习模型拥有大量难以汇总的参数。
您能否在不使用机器学习的情况下确定四个特征与鸢尾花品种之间的关系?也就是说,您能否使用传统编程技巧(例如大量条件语句)创建模型?也许能,前提是反复分析该数据集,并最终确定花瓣和花萼测量值与特定品种的关系。对于更复杂的数据集来说,这会变得非常困难,或许根本就做不到。一个好的机器学习方法可为您确定模型。如果您将足够多的代表性样本馈送到正确类型的机器学习模型中,该程序便会为您找出相应的关系。
### 选择模型
我们需要选择要进行训练的模型类型。模型具有许多类型,挑选合适的类型需要一定的经验。本教程使用神经网络来解决鸢尾花分类问题。[神经网络](https://developers.google.cn/machine-learning/glossary/#neural_network)可以发现特征与标签之间的复杂关系。神经网络是一个高度结构化的图,其中包含一个或多个[隐含层](https://developers.google.cn/machine-learning/glossary/#hidden_layer)。每个隐含层都包含一个或多个[神经元](https://developers.google.cn/machine-learning/glossary/#neuron)。 神经网络有多种类别,该程序使用的是密集型神经网络,也称为[全连接神经网络](https://developers.google.cn/machine-learning/glossary/#fully_connected_layer) : 一个层中的神经元将从上一层中的每个神经元获取输入连接。例如,图 2 显示了一个密集型神经网络,其中包含 1 个输入层、2 个隐藏层以及 1 个输出层:
| ![网络结构示意图: 输入层, 2 隐含层, 输出层](img/d6c8610603858ddd864cc7f024f16e40.png) |
| **图 2.** 包含特征、隐藏层和预测的神经网络
  |
当图 2 中的模型经过训练并获得无标签样本后,它会产生 3 个预测结果:相应鸢尾花属于指定品种的可能性。这种预测称为[推理](https://developers.google.cn/machine-learning/crash-course/glossary#inference)。对于该示例,输出预测结果的总和是 1.0。在图 2 中,该预测结果分解如下:山鸢尾为 0.02,变色鸢尾为 0.95,维吉尼亚鸢尾为 0.03。这意味着该模型预测某个无标签鸢尾花样本是变色鸢尾的概率为 95
### 使用 Keras 创建模型
TensorFlow [tf.keras](https://tensorflow.google.cn/api_docs/python/tf/keras) API 是创建模型和层的首选方式。通过该 API您可以轻松地构建模型并进行实验而将所有部分连接在一起的复杂工作则由 Keras 处理。
[tf.keras.Sequential](https://tensorflow.google.cn/api_docs/python/tf/keras/Sequential) 模型是层的线性堆叠。该模型的构造函数会采用一系列层实例;在本示例中,采用的是 2 个[密集层](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/Dense)(各自包含 10 个节点),以及 1 个输出层(包含 3 个代表标签预测的节点。第一个层的 `input_shape` 参数对应该数据集中的特征数量,它是一项必需参数:
```py
model = tf.keras.Sequential([
tf.keras.layers.Dense(10, activation=tf.nn.relu, input_shape=(4,)), # 需要给出输入的形式
tf.keras.layers.Dense(10, activation=tf.nn.relu),
tf.keras.layers.Dense(3)
])
```
[激活函数](https://developers.google.cn/machine-learning/crash-course/glossary#activation_function)可决定层中每个节点的输出形式。 这些非线性关系很重要,如果没有它们,模型将等同于单个层。[激活函数](https://tensorflow.google.cn/api_docs/python/tf/keras/activations)有很多种,但隐藏层通常使用 [ReLU](https://developers.google.cn/machine-learning/crash-course/glossary#ReLU)。
隐藏层和神经元的理想数量取决于问题和数据集。与机器学习的多个方面一样,选择最佳的神经网络形状需要一定的知识水平和实验基础。一般来说,增加隐藏层和神经元的数量通常会产生更强大的模型,而这需要更多数据才能有效地进行训练。
### 使用模型
我们快速了解一下此模型如何处理一批特征:
```py
predictions = model(features)
predictions[:5]
```
```py
<tf.Tensor: shape=(5, 3), dtype=float32, numpy=
array([[ 2.371686 , -3.2583737 , 0.06788294],
[ 2.1781201 , -3.0004797 , 0.07583394],
[ 1.4679078 , -2.8879187 , -0.13730617],
[ 1.60235 , -3.2915173 , -0.18439294],
[ 2.3404026 , -3.2052171 , 0.06615102]], dtype=float32)>
```
在此示例中,每个样本针对每个类别返回一个 [logit](https://developers.google.cn/machine-learning/crash-course/glossary#logits)。
要将这些对数转换为每个类别的概率,请使用 [softmax](https://developers.google.cn/machine-learning/crash-course/glossary#softmax) 函数:
```py
tf.nn.softmax(predictions[:5])
```
```py
<tf.Tensor: shape=(5, 3), dtype=float32, numpy=
array([[0.9062349 , 0.0032519 , 0.09051319],
[0.88667214, 0.00499719, 0.10833076],
[0.8239415 , 0.01057268, 0.16548584],
[0.85106575, 0.00637652, 0.14255764],
[0.90352327, 0.00352783, 0.09294892]], dtype=float32)>
```
对每个类别执行 [`tf.argmax`](https://tensorflow.google.cn/api_docs/python/tf/math/argmax) 运算可得出预测的类别索引。不过,该模型尚未接受训练,因此这些预测并不理想。
```py
print("Prediction: {}".format(tf.argmax(predictions, axis=1)))
print(" Labels: {}".format(labels))
```
```py
Prediction: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
Labels: [0 0 2 2 0 2 0 1 0 0 1 1 0 1 0 0 2 2 1 1 0 2 2 0 0 2 1 0 0 0 2 2]
```
## 训练模型
[训练](https://developers.google.cn/machine-learning/crash-course/glossary#training) 是一个机器学习阶段,在此阶段中,模型会逐渐得到优化,也就是说,模型会了解数据集。目标是充分了解训练数据集的结构,以便对未见过的数据进行预测。如果您从训练数据集中获得了过多的信息,预测便会仅适用于模型见过的数据,但是无法泛化。此问题被称之为[过拟合](https://developers.google.cn/machine-learning/crash-course/glossary#overfitting)—就好比将答案死记硬背下来,而不去理解问题的解决方式。
鸢尾花分类问题是[监督式机器学习](https://developers.google.cn/machine-learning/glossary/#supervised_machine_learning)的一个示例: 模型通过包含标签的样本加以训练。 而在[非监督式机器学习](https://developers.google.cn/machine-learning/glossary/#unsupervised_machine_learning)中,样本不包含标签。相反,模型通常会在特征中发现一些规律。
### 定义损失和梯度函数
在训练和评估阶段,我们都需要计算模型的[损失](https://developers.google.cn/machine-learning/crash-course/glossary#loss)。 这样可以衡量模型的预测结果与预期标签有多大偏差,也就是说,模型的效果有多差。我们希望尽可能减小或优化这个值。
我们的模型会使用 [`tf.keras.losses.SparseCategoricalCrossentropy`](https://tensorflow.google.cn/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy) 函数计算其损失,此函数会接受模型的类别概率预测结果和预期标签,然后返回样本的平均损失。
```py
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
```
```py
def loss(model, x, y):
y_ = model(x)
return loss_object(y_true=y, y_pred=y_)
l = loss(model, features, labels)
print("Loss test: {}".format(l))
```
```py
Loss test: 1.6707830429077148
```
使用 [tf.GradientTape](https://tensorflow.google.cn/api_docs/python/tf/GradientTape) 的前后关系来计算[梯度](https://developers.google.cn/machine-learning/crash-course/glossary#gradient)以优化你的模型:
```py
def grad(model, inputs, targets):
with tf.GradientTape() as tape:
loss_value = loss(model, inputs, targets)
return loss_value, tape.gradient(loss_value, model.trainable_variables)
```
### 创建优化器
[优化器](https://developers.google.cn/machine-learning/crash-course/glossary#optimizer) 会将计算出的梯度应用于模型的变量,以使 `loss` 函数最小化。您可以将损失函数想象为一个曲面(见图 3我们希望通过到处走动找到该曲面的最低点。梯度指向最高速上升的方向因此我们将沿相反的方向向下移动。我们以迭代方式计算每个批次的损失和梯度以在训练过程中调整模型。模型会逐渐找到权重和偏差的最佳组合从而将损失降至最低。损失越低模型的预测效果就越好。
| ![Optimization algorithms visualized over time in 3D space.](img/fb0bdd5ec0ad3a81aa686b46a6fa16d7.png) |
| **图 3.** 优化算法在三维空间中随时间推移而变化的可视化效果。
(来源: [斯坦福大学 CS231n 课程](http://cs231n.github.io/neural-networks-3/)MIT 许可证Image credit: [Alec Radford](https://twitter.com/alecrad)) |
TensorFlow 有许多可用于训练的[优化算法](https://tensorflow.google.cn/api_guides/python/train)。此模型使用的是 [tf.train.GradientDescentOptimizer](https://tensorflow.google.cn/api_docs/python/tf/train/GradientDescentOptimizer) 它可以实现[随机梯度下降法](https://developers.google.cn/machine-learning/crash-course/glossary#gradient_descent)SGD`learning_rate` 被用于设置每次迭代(向下行走)的步长。 这是一个 *超参数* ,您通常需要调整此参数以获得更好的结果。
我们来设置优化器:
```py
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)
```
我们将使用它来计算单个优化步骤:
```py
loss_value, grads = grad(model, features, labels)
print("Step: {}, Initial Loss: {}".format(optimizer.iterations.numpy(),
loss_value.numpy()))
optimizer.apply_gradients(zip(grads, model.trainable_variables))
print("Step: {}, Loss: {}".format(optimizer.iterations.numpy(),
loss(model, features, labels).numpy()))
```
```py
Step: 0, Initial Loss: 1.6707830429077148
Step: 1, Loss: 1.447718620300293
```
### 训练循环
一切准备就绪后,就可以开始训练模型了!训练循环会将数据集样本馈送到模型中,以帮助模型做出更好的预测。以下代码块可设置这些训练步骤:
1. 迭代每个周期。通过一次数据集即为一个周期。
2. 在一个周期中,遍历训练 `Dataset` 中的每个样本,并获取样本的*特征*`x`)和*标签*`y`)。
3. 根据样本的特征进行预测,并比较预测结果和标签。衡量预测结果的不准确性,并使用所得的值计算模型的损失和梯度。
4. 使用 `optimizer` 更新模型的变量。
5. 跟踪一些统计信息以进行可视化。
6. 对每个周期重复执行以上步骤。
`num_epochs` 变量是遍历数据集集合的次数。与直觉恰恰相反的是,训练模型的时间越长,并不能保证模型就越好。`num_epochs` 是一个可以调整的[超参数](https://developers.google.cn/machine-learning/glossary/#hyperparameter)。选择正确的次数通常需要一定的经验和实验基础。
```py
## Note: 使用相同的模型变量重新运行此单元
# 保留结果用于绘制
train_loss_results = []
train_accuracy_results = []
num_epochs = 201
for epoch in range(num_epochs):
epoch_loss_avg = tf.keras.metrics.Mean()
epoch_accuracy = tf.keras.metrics.SparseCategoricalAccuracy()
# Training loop - using batches of 32
for x, y in train_dataset:
# 优化模型
loss_value, grads = grad(model, x, y)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
# 追踪进度
epoch_loss_avg(loss_value) # 添加当前的 batch loss
# 比较预测标签与真实标签
epoch_accuracy(y, model(x))
# 循环结束
train_loss_results.append(epoch_loss_avg.result())
train_accuracy_results.append(epoch_accuracy.result())
if epoch % 50 == 0:
print("Epoch {:03d}: Loss: {:.3f}, Accuracy: {:.3%}".format(epoch,
epoch_loss_avg.result(),
epoch_accuracy.result()))
```
```py
Epoch 000: Loss: 1.470, Accuracy: 35.833%
Epoch 050: Loss: 0.112, Accuracy: 96.667%
Epoch 100: Loss: 0.055, Accuracy: 98.333%
Epoch 150: Loss: 0.065, Accuracy: 98.333%
Epoch 200: Loss: 0.053, Accuracy: 98.333%
```
### 可视化损失函数随时间推移而变化的情况
虽然输出模型的训练过程有帮助,但查看这一过程往往*更有帮助*。 [TensorBoard](https://tensorflow.google.cn/guide/summaries_and_tensorboard) 是与 TensorFlow 封装在一起的出色可视化工具,不过我们可以使用 `matplotlib` 模块创建基本图表。
解读这些图表需要一定的经验,不过您确实希望看到*损失*下降且*准确率*上升。
```py
fig, axes = plt.subplots(2, sharex=True, figsize=(12, 8))
fig.suptitle('Training Metrics')
axes[0].set_ylabel("Loss", fontsize=14)
axes[0].plot(train_loss_results)
axes[1].set_ylabel("Accuracy", fontsize=14)
axes[1].set_xlabel("Epoch", fontsize=14)
axes[1].plot(train_accuracy_results)
plt.show()
```
![png](img/4123df32a452f5e3727c6372cf1fa755.png)
## 评估模型的效果
模型已经过训练,现在我们可以获取一些关于其效果的统计信息了。
*评估* 指的是确定模型做出预测的效果。要确定模型在鸢尾花分类方面的效果,请将一些花萼和花瓣测量值传递给模型,并要求模型预测它们所代表的鸢尾花品种。然后,将模型的预测结果与实际标签进行比较。例如,如果模型对一半输入样本的品种预测正确,则 [准确率](https://developers.google.cn/machine-learning/glossary/#accuracy) 为 `0.5` 。 图 4 显示的是一个效果更好一些的模型,该模型做出 5 次预测,其中有 4 次正确,准确率为 80%
<colgroup><col span="4"> <col span="1" bgcolor="lightblue"> <col span="1" bgcolor="lightgreen"></colgroup>
| 样本特征 | 标签 | 模型预测 |
| 5.9 | 3.0 | 4.3 | 1.5 | 1 | 1 |
| 6.9 | 3.1 | 5.4 | 2.1 | 2 | 2 |
| 5.1 | 3.3 | 1.7 | 0.5 | 0 | 0 |
| 6.0 | 3.4 | 4.5 | 1.6 | 1 | 2 |
| 5.5 | 2.5 | 4.0 | 1.3 | 1 | 1 |
| **图 4.** 准确率为 80% 的鸢尾花分类器
  |
### 建立测试数据集
评估模型与训练模型相似。最大的区别在于,样本来自一个单独的[测试集](https://developers.google.cn/machine-learning/crash-course/glossary#test_set),而不是训练集。为了公正地评估模型的效果,用于评估模型的样本务必与用于训练模型的样本不同。
测试 `Dataset` 的建立与训练 `Dataset` 相似。下载 CSV 文本文件并解析相应的值,然后对数据稍加随机化处理:
```py
test_url = "https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv"
test_fp = tf.keras.utils.get_file(fname=os.path.basename(test_url),
origin=test_url)
```
```py
Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv
8192/573 [============================================================================================================================================================================================================================================================================================================================================================================================================================================] - 0s 0us/step
```
```py
test_dataset = tf.data.experimental.make_csv_dataset(
test_fp,
batch_size,
column_names=column_names,
label_name='species',
num_epochs=1,
shuffle=False)
test_dataset = test_dataset.map(pack_features_vector)
```
### 根据测试数据集评估模型
与训练阶段不同,模型仅评估测试数据的一个[周期](https://developers.google.cn/machine-learning/glossary/#epoch)。在以下代码单元格中,我们会遍历测试集中的每个样本,然后将模型的预测结果与实际标签进行比较。这是为了衡量模型在整个测试集中的准确率。
```py
test_accuracy = tf.keras.metrics.Accuracy()
for (x, y) in test_dataset:
logits = model(x)
prediction = tf.argmax(logits, axis=1, output_type=tf.int32)
test_accuracy(prediction, y)
print("Test set accuracy: {:.3%}".format(test_accuracy.result()))
```
```py
Test set accuracy: 96.667%
```
例如,我们可以看到对于最后一批数据,该模型通常预测正确:
```py
tf.stack([y,prediction],axis=1)
```
```py
<tf.Tensor: shape=(30, 2), dtype=int32, numpy=
array([[1, 1],
[2, 2],
[0, 0],
[1, 1],
[1, 1],
[1, 1],
[0, 0],
[2, 2],
[1, 1],
[2, 2],
[2, 2],
[0, 0],
[2, 2],
[1, 1],
[1, 1],
[0, 0],
[1, 1],
[0, 0],
[0, 0],
[2, 2],
[0, 0],
[1, 1],
[2, 2],
[1, 2],
[1, 1],
[1, 1],
[0, 0],
[1, 1],
[2, 2],
[1, 1]], dtype=int32)>
```
## 使用经过训练的模型进行预测
我们已经训练了一个模型并“证明”它是有效的,但在对鸢尾花品种进行分类方面,这还不够。现在,我们使用经过训练的模型对 [无标签样本](https://developers.google.cn/machine-learning/glossary/#unlabeled_example)(即包含特征但不包含标签的样本)进行一些预测。
在现实生活中无标签样本可能来自很多不同的来源包括应用、CSV 文件和数据 Feed。暂时我们将手动提供三个无标签样本以预测其标签。回想一下标签编号会映射到一个指定的表示法
* `0`: 山鸢尾
* `1`: 变色鸢尾
* `2`: 维吉尼亚鸢尾
```py
predict_dataset = tf.convert_to_tensor([
[5.1, 3.3, 1.7, 0.5,],
[5.9, 3.0, 4.2, 1.5,],
[6.9, 3.1, 5.4, 2.1]
])
predictions = model(predict_dataset)
for i, logits in enumerate(predictions):
class_idx = tf.argmax(logits).numpy()
p = tf.nn.softmax(logits)[class_idx]
name = class_names[class_idx]
print("Example {} prediction: {} ({:4.1f}%)".format(i, name, 100*p))
```
```py
Example 0 prediction: Iris setosa (99.9%)
Example 1 prediction: Iris versicolor (99.8%)
Example 2 prediction: Iris virginica (99.6%)
```

View File

@@ -1 +0,0 @@
# 分布式训练

View File

@@ -1,450 +0,0 @@
# Keras 的分布式训练
> 原文:[https://tensorflow.google.cn/tutorials/distribute/keras](https://tensorflow.google.cn/tutorials/distribute/keras)
<devsite-mathjax config="TeX-AMS-MML_SVG"></devsite-mathjax>
**Note:** 我们的 TensorFlow 社区翻译了这些文档。因为社区翻译是尽力而为, 所以无法保证它们是最准确的,并且反映了最新的 [官方英文文档](https://tensorflow.google.cn/?hl=en)。如果您有改进此翻译的建议, 请提交 pull request 到 [tensorflow/docs](https://github.com/tensorflow/docs) GitHub 仓库。要志愿地撰写或者审核译文,请加入 [docs-zh-cn@tensorflow.org Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs-zh-cn)。
## 概述
[`tf.distribute.Strategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/Strategy) API 提供了一个抽象的 API 用于跨多个处理单元processing units分布式训练。它的目的是允许用户使用现有模型和训练代码只需要很少的修改就可以启用分布式训练。
本教程使用 [`tf.distribute.MirroredStrategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/MirroredStrategy),这是在一台计算机上的多 GPU单机多卡进行同时训练的图形内复制in-graph replication。事实上它会将所有模型的变量复制到每个处理器上然后通过使用 [all-reduce](http://mpitutorial.com/tutorials/mpi-reduce-and-allreduce/) 去整合所有处理器的梯度gradients并将整合的结果应用于所有副本之中。
`MirroredStategy` 是 tensorflow 中可用的几种分发策略之一。 您可以在 [分发策略指南](https://tensorflow.google.cn/guide/distribute_strategy) 中阅读更多分发策略。
### Keras API
这个例子使用 [`tf.keras`](https://tensorflow.google.cn/api_docs/python/tf/keras) API 去构建和训练模型。 关于自定义训练模型,请参阅 [tf.distribute.Strategy with training loops](/tutorials/distribute/training_loops) 教程。
## 导入依赖
```py
# 导入 TensorFlow 和 TensorFlow 数据集
import tensorflow_datasets as tfds
import tensorflow as tf
tfds.disable_progress_bar()
import os
```
```py
print(tf.__version__)
```
```py
2.3.0
```
## 下载数据集
下载 MNIST 数据集并从 [TensorFlow Datasets](https://tensorflow.google.cn/datasets) 加载。 这会返回 [`tf.data`](https://tensorflow.google.cn/api_docs/python/tf/data) 格式的数据集。
`with_info` 设置为 `True` 会包含整个数据集的元数据,其中这些数据集将保存在 `info` 中。 除此之外,该元数据对象包括训练和测试示例的数量。
```py
datasets, info = tfds.load(name='mnist', with_info=True, as_supervised=True)
mnist_train, mnist_test = datasets['train'], datasets['test']
```
## 定义分配策略
创建一个 `MirroredStrategy` 对象。这将处理分配策略,并提供一个上下文管理器([`tf.distribute.MirroredStrategy.scope`](https://tensorflow.google.cn/api_docs/python/tf/distribute/MirroredStrategy#scope))来构建你的模型。
```py
strategy = tf.distribute.MirroredStrategy()
```
```py
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
```
```py
print('Number of devices: {}'.format(strategy.num_replicas_in_sync))
```
```py
Number of devices: 1
```
## 设置输入管道pipeline
在训练具有多个 GPU 的模型时您可以通过增加批量大小batch size来有效地使用额外的计算能力。通常来说使用适合 GPU 内存的最大批量大小batch size并相应地调整学习速率。
```py
# 您还可以执行 info.splits.total_num_examples 来获取总数
# 数据集中的样例数量。
num_train_examples = info.splits['train'].num_examples
num_test_examples = info.splits['test'].num_examples
BUFFER_SIZE = 10000
BATCH_SIZE_PER_REPLICA = 64
BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync
```
0-255 的像素值, [必须标准化到 0-1 范围](https://en.wikipedia.org/wiki/Feature_scaling)。在函数中定义标准化。
```py
def scale(image, label):
image = tf.cast(image, tf.float32)
image /= 255
return image, label
```
将此功能应用于训练和测试数据,随机打乱训练数据,并[批量训练](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#batch)。 请注意,我们还保留了训练数据的内存缓存以提高性能。
```py
train_dataset = mnist_train.map(scale).cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
eval_dataset = mnist_test.map(scale).batch(BATCH_SIZE)
```
## 生成模型
`strategy.scope` 的上下文中创建和编译 Keras 模型。
```py
with strategy.scope():
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10)
])
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=tf.keras.optimizers.Adam(),
metrics=['accuracy'])
```
## 定义回调callback
这里使用的回调callbacks
* *TensorBoard*: 此回调callbacks为 TensorBoard 写入日志,允许您可视化图形。
* *Model Checkpoint*: 此回调callbacks在每个 epoch 后保存模型。
* *Learning Rate Scheduler*: 使用此回调callbacks您可以安排学习率在每个 epoch/batch 之后更改。
为了便于说明添加打印回调callbacks以在笔记本中显示*学习率*。
```py
# 定义检查点checkpoint目录以存储检查点checkpoints
checkpoint_dir = './training_checkpoints'
# 检查点checkpoint文件的名称
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")
```
```py
# 衰减学习率的函数。
# 您可以定义所需的任何衰减函数。
def decay(epoch):
if epoch < 3:
return 1e-3
elif epoch >= 3 and epoch < 7:
return 1e-4
else:
return 1e-5
```
```py
# 在每个 epoch 结束时打印 LR 的回调callbacks
class PrintLR(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
print('\nLearning rate for epoch {} is {}'.format(epoch + 1,
model.optimizer.lr.numpy()))
```
```py
callbacks = [
tf.keras.callbacks.TensorBoard(log_dir='./logs'),
tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_prefix,
save_weights_only=True),
tf.keras.callbacks.LearningRateScheduler(decay),
PrintLR()
]
```
## 训练和评估
在该部分,以普通的方式训练模型,在模型上调用 `fit` 并传入在教程开始时创建的数据集。 无论您是否分布式训练,此步骤都是相同的。
```py
model.fit(train_dataset, epochs=12, callbacks=callbacks)
```
```py
Epoch 1/12
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py:601: get_next_as_optional (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Iterator.get_next_as_optional()` instead.
Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py:601: get_next_as_optional (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Iterator.get_next_as_optional()` instead.
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
1/938 [..............................] - ETA: 0s - loss: 2.3194 - accuracy: 0.0938WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/ops/summary_ops_v2.py:1277: stop (from tensorflow.python.eager.profiler) is deprecated and will be removed after 2020-07-01.
Instructions for updating:
use `tf.profiler.experimental.stop` instead.
Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/ops/summary_ops_v2.py:1277: stop (from tensorflow.python.eager.profiler) is deprecated and will be removed after 2020-07-01.
Instructions for updating:
use `tf.profiler.experimental.stop` instead.
Warning:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0046s vs `on_train_batch_end` time: 0.0296s). Check your callbacks.
Warning:tensorflow:Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0046s vs `on_train_batch_end` time: 0.0296s). Check your callbacks.
932/938 [============================>.] - ETA: 0s - loss: 0.2055 - accuracy: 0.9422
Learning rate for epoch 1 is 0.0010000000474974513
938/938 [==============================] - 4s 5ms/step - loss: 0.2049 - accuracy: 0.9424
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
Epoch 2/12
922/938 [============================>.] - ETA: 0s - loss: 0.0681 - accuracy: 0.9797
Learning rate for epoch 2 is 0.0010000000474974513
938/938 [==============================] - 3s 3ms/step - loss: 0.0680 - accuracy: 0.9798
Epoch 3/12
930/938 [============================>.] - ETA: 0s - loss: 0.0484 - accuracy: 0.9855
Learning rate for epoch 3 is 0.0010000000474974513
938/938 [==============================] - 3s 3ms/step - loss: 0.0484 - accuracy: 0.9855
Epoch 4/12
920/938 [============================>.] - ETA: 0s - loss: 0.0277 - accuracy: 0.9925
Learning rate for epoch 4 is 9.999999747378752e-05
938/938 [==============================] - 3s 3ms/step - loss: 0.0276 - accuracy: 0.9926
Epoch 5/12
931/938 [============================>.] - ETA: 0s - loss: 0.0248 - accuracy: 0.9935
Learning rate for epoch 5 is 9.999999747378752e-05
938/938 [==============================] - 3s 3ms/step - loss: 0.0247 - accuracy: 0.9936
Epoch 6/12
931/938 [============================>.] - ETA: 0s - loss: 0.0231 - accuracy: 0.9938
Learning rate for epoch 6 is 9.999999747378752e-05
938/938 [==============================] - 3s 3ms/step - loss: 0.0230 - accuracy: 0.9938
Epoch 7/12
936/938 [============================>.] - ETA: 0s - loss: 0.0217 - accuracy: 0.9941
Learning rate for epoch 7 is 9.999999747378752e-05
938/938 [==============================] - 3s 3ms/step - loss: 0.0216 - accuracy: 0.9941
Epoch 8/12
932/938 [============================>.] - ETA: 0s - loss: 0.0189 - accuracy: 0.9952
Learning rate for epoch 8 is 9.999999747378752e-06
938/938 [==============================] - 3s 3ms/step - loss: 0.0189 - accuracy: 0.9952
Epoch 9/12
932/938 [============================>.] - ETA: 0s - loss: 0.0188 - accuracy: 0.9953
Learning rate for epoch 9 is 9.999999747378752e-06
938/938 [==============================] - 3s 3ms/step - loss: 0.0187 - accuracy: 0.9953
Epoch 10/12
932/938 [============================>.] - ETA: 0s - loss: 0.0185 - accuracy: 0.9953
Learning rate for epoch 10 is 9.999999747378752e-06
938/938 [==============================] - 3s 3ms/step - loss: 0.0185 - accuracy: 0.9953
Epoch 11/12
934/938 [============================>.] - ETA: 0s - loss: 0.0183 - accuracy: 0.9953
Learning rate for epoch 11 is 9.999999747378752e-06
938/938 [==============================] - 3s 3ms/step - loss: 0.0184 - accuracy: 0.9953
Epoch 12/12
931/938 [============================>.] - ETA: 0s - loss: 0.0183 - accuracy: 0.9954
Learning rate for epoch 12 is 9.999999747378752e-06
938/938 [==============================] - 3s 3ms/step - loss: 0.0182 - accuracy: 0.9955
<tensorflow.python.keras.callbacks.History at 0x7fe470118978>
```
如下所示检查点checkpoint将被保存。
```py
# 检查检查点checkpoint目录
ls {checkpoint_dir}
```
```py
checkpoint ckpt_4.data-00000-of-00001
ckpt_1.data-00000-of-00001 ckpt_4.index
ckpt_1.index ckpt_5.data-00000-of-00001
ckpt_10.data-00000-of-00001 ckpt_5.index
ckpt_10.index ckpt_6.data-00000-of-00001
ckpt_11.data-00000-of-00001 ckpt_6.index
ckpt_11.index ckpt_7.data-00000-of-00001
ckpt_12.data-00000-of-00001 ckpt_7.index
ckpt_12.index ckpt_8.data-00000-of-00001
ckpt_2.data-00000-of-00001 ckpt_8.index
ckpt_2.index ckpt_9.data-00000-of-00001
ckpt_3.data-00000-of-00001 ckpt_9.index
ckpt_3.index
```
要查看模型的执行方式请加载最新的检查点checkpoint并在测试数据上调用 `evaluate`
使用适当的数据集调用 `evaluate`
```py
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
eval_loss, eval_acc = model.evaluate(eval_dataset)
print('Eval loss: {}, Eval Accuracy: {}'.format(eval_loss, eval_acc))
```
```py
157/157 [==============================] - 1s 6ms/step - loss: 0.0399 - accuracy: 0.9861
Eval loss: 0.03988004848361015, Eval Accuracy: 0.9861000180244446
```
要查看输出,您可以在终端下载并查看 TensorBoard 日志。
```py
$ tensorboard --logdir=path/to/log-directory
```
```py
ls -sh ./logs
```
```py
total 4.0K
4.0K train
```
## 导出到 SavedModel
将图形和变量导出为与平台无关的 SavedModel 格式。 保存模型后,可以在有或没有 scope 的情况下加载模型。
```py
path = 'saved_model/'
```
```py
model.save(path, save_format='tf')
```
```py
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/tracking/tracking.py:111: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/tracking/tracking.py:111: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/tracking/tracking.py:111: Layer.updates (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/tracking/tracking.py:111: Layer.updates (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
INFO:tensorflow:Assets written to: saved_model/assets
INFO:tensorflow:Assets written to: saved_model/assets
```
在无需 `strategy.scope` 加载模型。
```py
unreplicated_model = tf.keras.models.load_model(path)
unreplicated_model.compile(
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=tf.keras.optimizers.Adam(),
metrics=['accuracy'])
eval_loss, eval_acc = unreplicated_model.evaluate(eval_dataset)
print('Eval loss: {}, Eval Accuracy: {}'.format(eval_loss, eval_acc))
```
```py
157/157 [==============================] - 1s 3ms/step - loss: 0.0399 - accuracy: 0.9861
Eval loss: 0.03988004848361015, Eval Accuracy: 0.9861000180244446
```
在含 `strategy.scope` 加载模型。
```py
with strategy.scope():
replicated_model = tf.keras.models.load_model(path)
replicated_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=tf.keras.optimizers.Adam(),
metrics=['accuracy'])
eval_loss, eval_acc = replicated_model.evaluate(eval_dataset)
print ('Eval loss: {}, Eval Accuracy: {}'.format(eval_loss, eval_acc))
```
```py
157/157 [==============================] - 1s 5ms/step - loss: 0.0399 - accuracy: 0.9861
Eval loss: 0.03988004848361015, Eval Accuracy: 0.9861000180244446
```
### 示例和教程
以下是使用 keras fit/compile 分布式策略的一些示例:
1. 使用[`tf.distribute.MirroredStrategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/MirroredStrategy) 训练 [Transformer](https://github.com/tensorflow/models/blob/master/official/nlp/transformer/transformer_main.py) 的示例。
2. 使用[`tf.distribute.MirroredStrategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/MirroredStrategy) 训练 [NCF](https://github.com/tensorflow/models/blob/master/official/recommendation/ncf_keras_main.py) 的示例。
[分布式策略指南](https://tensorflow.google.cn/guide/distribute_strategy#examples_and_tutorials)中列出的更多示例
## 下一步
* 阅读[分布式策略指南](https://tensorflow.google.cn/guide/distribute_strategy)。
* 阅读[自定义训练的分布式训练](/tutorials/distribute/training_loops)教程。
注意:[`tf.distribute.Strategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/Strategy) 正在积极开发中,我们将在不久的将来添加更多示例和教程。欢迎您进行尝试。我们欢迎您通过 [GitHub 上的 issue](https://github.com/tensorflow/tensorflow/issues/new) 提供反馈。

View File

@@ -1,447 +0,0 @@
# 使用 tf.distribute.Strategy 进行自定义训练
> 原文:[https://tensorflow.google.cn/tutorials/distribute/custom_training](https://tensorflow.google.cn/tutorials/distribute/custom_training)
本教程演示了如何使用 [`tf.distribute.Strategy`](https://tensorflow.google.cn/guide/distribute_strategy) 来进行自定义训练循环。 我们将在流行的 MNIST 数据集上训练一个简单的 CNN 模型。 流行的 MNIST 数据集包含了 60000 张尺寸为 28 x 28 的训练图像和 10000 张尺寸为 28 x 28 的测试图像。
我们用自定义训练循环来训练我们的模型是因为它们在训练的过程中为我们提供了灵活性和在训练过程中更好的控制。而且,使它们调试模型和训练循环的时候更容易。
```py
# 导入 TensorFlow
import tensorflow as tf
# 帮助库
import numpy as np
import os
print(tf.__version__)
```
```py
2.3.0
```
## 下载流行的 MNIST 数据集
```py
fashion_mnist = tf.keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
# 向数组添加维度 -> 新的维度 == (28, 28, 1)
# 我们这样做是因为我们模型中的第一层是卷积层
# 而且它需要一个四维的输入 (批大小, 高, 宽, 通道).
# 批大小维度稍后将添加。
train_images = train_images[..., None]
test_images = test_images[..., None]
# 获取[0,1]范围内的图像。
train_images = train_images / np.float32(255)
test_images = test_images / np.float32(255)
```
```py
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
32768/29515 [=================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
26427392/26421880 [==============================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
8192/5148 [===============================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
4423680/4422102 [==============================] - 0s 0us/step
```
## 创建一个分发变量和图形的策略
[`tf.distribute.MirroredStrategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/MirroredStrategy) 策略是如何运作的?
* 所有变量和模型图都复制在副本上。
* 输入都均匀分布在副本中。
* 每个副本在收到输入后计算输入的损失和梯度。
* 通过求和,每一个副本上的梯度都能同步。
* 同步后,每个副本上的复制的变量都可以同样更新。
注意:您可以将下面的所有代码放在一个单独单元内。 我们将它分成几个代码单元用于说明目的。
```py
# 如果设备未在 `tf.distribute.MirroredStrategy` 的指定列表中,它会被自动检测到。
strategy = tf.distribute.MirroredStrategy()
```
```py
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
```
```py
print ('Number of devices: {}'.format(strategy.num_replicas_in_sync))
```
```py
Number of devices: 1
```
## 设置输入流水线
将图形和变量导出成平台不可识别的 SavedModel 格式。在你的模型保存后,你可以在有或没有范围的情况下载入它。
```py
BUFFER_SIZE = len(train_images)
BATCH_SIZE_PER_REPLICA = 64
GLOBAL_BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync
EPOCHS = 10
```
创建数据集并分发它们:
```py
train_dataset = tf.data.Dataset.from_tensor_slices((train_images, train_labels)).shuffle(BUFFER_SIZE).batch(GLOBAL_BATCH_SIZE)
test_dataset = tf.data.Dataset.from_tensor_slices((test_images, test_labels)).batch(GLOBAL_BATCH_SIZE)
train_dist_dataset = strategy.experimental_distribute_dataset(train_dataset)
test_dist_dataset = strategy.experimental_distribute_dataset(test_dataset)
```
## 创建模型
使用 [`tf.keras.Sequential`](https://tensorflow.google.cn/api_docs/python/tf/keras/Sequential) 创建一个模型。你也可以使用模型子类化 API 来完成这个。
```py
def create_model():
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, 3, activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Conv2D(64, 3, activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
return model
```
```py
# 创建检查点目录以存储检查点。
checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
```
## 定义损失函数
通常,在一台只有一个 GPU / CPU 的机器上,损失需要除去输入批量中的示例数。
*那么,使用 [`tf.distribute.Strategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/Strategy) 时应该如何计算损失?*
* 举一个例子,假设您有 4 个 GPU批量大小为 64\. 输入的一个批次分布在各个副本4 个 GPU每个副本获得的输入大小为 16。
* 每个副本上的模型使用其各自的输入执行正向传递并计算损失。 现在相较于将损耗除以其各自输入中的示例数BATCH_SIZE_PER_REPLICA = 16应将损失除以 GLOBAL_BATCH_SIZE64
*为什么这样做?*
* 需要这样做是因为在每个副本上计算梯度之后,它们通过 **summing** 来使得在自身在各个副本之间同步。
*如何在 TensorFlow 中执行此操作?*
* 如果您正在编写自定义训练循环,如本教程中所示,您应该将每个示例损失相加并将总和除以 GLOBAL_BATCH_SIZE `scale_loss = tf.reduce_sum(loss) * (1\. / GLOBAL_BATCH_SIZE)` 或者你可以使用`tf.nn.compute_average_loss` 来获取每个示例的损失,可选的样本权重,将 GLOBAL_BATCH_SIZE 作为参数,并返回缩放的损失。
* 如果您在模型中使用正则化损失,则需要进行缩放多个副本的损失。 您可以使用[`tf.nn.scale_regularization_loss`](https://tensorflow.google.cn/api_docs/python/tf/nn/scale_regularization_loss)函数执行此操作。
* 建议不要使用[`tf.reduce_mean`](https://tensorflow.google.cn/api_docs/python/tf/math/reduce_mean)。 这样做会将损失除以实际的每个副本中每一步都会改变的批次大小。
* 这种缩小和缩放是在 keras 中 `modelcompile``model.fit`中自动完成的
* 如果使用[`tf.keras.losses`](https://tensorflow.google.cn/api_docs/python/tf/keras/losses)类如下面这个例子所示则需要将损失减少明确指定为“NONE”或者“SUM”。 使用 [`tf.distribute.Strategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/Strategy) 时,`AUTO``SUM_OVER_BATCH_SIZE` 是不能使用的。 不能使用 `AUTO` 是因为用户应明确考虑到在分布式情况下他们想做的哪些减少是正确的。不能使用`SUM_OVER_BATCH_SIZE`是因为目前它只按每个副本批次大小进行划分,并按照用户的副本数进行划分,这导致了它们很容易丢失。 因此,我们要求用户要明确这些减少。
```py
with strategy.scope():
# 将减少设置为“无”,以便我们可以在之后进行这个减少并除以全局批量大小。
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(
reduction=tf.keras.losses.Reduction.NONE)
# 或者使用 loss_fn = tf.keras.losses.sparse_categorical_crossentropy
def compute_loss(labels, predictions):
per_example_loss = loss_object(labels, predictions)
return tf.nn.compute_average_loss(per_example_loss, global_batch_size=GLOBAL_BATCH_SIZE)
```
## 定义衡量指标以跟踪损失和准确性
这些指标可以跟踪测试的损失,训练和测试的准确性。 您可以使用`.result()`随时获取累积的统计信息。
```py
with strategy.scope():
test_loss = tf.keras.metrics.Mean(name='test_loss')
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(
name='train_accuracy')
test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(
name='test_accuracy')
```
## 训练循环
```py
# 必须在`strategy.scope`下创建模型和优化器。
with strategy.scope():
model = create_model()
optimizer = tf.keras.optimizers.Adam()
checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)
```
```py
with strategy.scope():
def train_step(inputs):
images, labels = inputs
with tf.GradientTape() as tape:
predictions = model(images, training=True)
loss = compute_loss(labels, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_accuracy.update_state(labels, predictions)
return loss
def test_step(inputs):
images, labels = inputs
predictions = model(images, training=False)
t_loss = loss_object(labels, predictions)
test_loss.update_state(t_loss)
test_accuracy.update_state(labels, predictions)
```
```py
with strategy.scope():
# `experimental_run_v2`将复制提供的计算并使用分布式输入运行它。
@tf.function
def distributed_train_step(dataset_inputs):
per_replica_losses = strategy.experimental_run_v2(train_step,
args=(dataset_inputs,))
return strategy.reduce(tf.distribute.ReduceOp.SUM, per_replica_losses,
axis=None)
@tf.function
def distributed_test_step(dataset_inputs):
return strategy.experimental_run_v2(test_step, args=(dataset_inputs,))
for epoch in range(EPOCHS):
# 训练循环
total_loss = 0.0
num_batches = 0
for x in train_dist_dataset:
total_loss += distributed_train_step(x)
num_batches += 1
train_loss = total_loss / num_batches
# 测试循环
for x in test_dist_dataset:
distributed_test_step(x)
if epoch % 2 == 0:
checkpoint.save(checkpoint_prefix)
template = ("Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, "
"Test Accuracy: {}")
print (template.format(epoch+1, train_loss,
train_accuracy.result()*100, test_loss.result(),
test_accuracy.result()*100))
test_loss.reset_states()
train_accuracy.reset_states()
test_accuracy.reset_states()
```
```py
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py:601: get_next_as_optional (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Iterator.get_next_as_optional()` instead.
WARNING:tensorflow:From <ipython-input-1-6439d0e9d271>:5: StrategyBase.experimental_run_v2 (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
renamed to `run`
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
Epoch 1, Loss: 0.5272247791290283, Accuracy: 80.95500183105469, Test Loss: 0.39799919724464417, Test Accuracy: 86.08000183105469
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
Epoch 2, Loss: 0.3536641597747803, Accuracy: 87.19000244140625, Test Loss: 0.3652512729167938, Test Accuracy: 86.79999542236328
Epoch 3, Loss: 0.30651605129241943, Accuracy: 88.96333312988281, Test Loss: 0.35199666023254395, Test Accuracy: 86.76000213623047
Epoch 4, Loss: 0.2756423354148865, Accuracy: 89.99333190917969, Test Loss: 0.2974560558795929, Test Accuracy: 89.1500015258789
Epoch 5, Loss: 0.24928639829158783, Accuracy: 90.86833953857422, Test Loss: 0.28945034742355347, Test Accuracy: 89.31999969482422
Epoch 6, Loss: 0.22822219133377075, Accuracy: 91.66999816894531, Test Loss: 0.2690503001213074, Test Accuracy: 90.13999938964844
Epoch 7, Loss: 0.21215270459651947, Accuracy: 92.19833374023438, Test Loss: 0.2673594057559967, Test Accuracy: 90.37000274658203
Epoch 8, Loss: 0.19466665387153625, Accuracy: 92.86500549316406, Test Loss: 0.280720591545105, Test Accuracy: 90.36000061035156
Epoch 9, Loss: 0.1819683462381363, Accuracy: 93.4000015258789, Test Loss: 0.2655133008956909, Test Accuracy: 90.54000091552734
Epoch 10, Loss: 0.16936612129211426, Accuracy: 93.711669921875, Test Loss: 0.26561689376831055, Test Accuracy: 90.55999755859375
```
以上示例中需要注意的事项:
* 我们使用`for x in ...`迭代构造`train_dist_dataset``test_dist_dataset`
* 缩放损失是`distributed_train_step`的返回值。 这个值会在各个副本使用[`tf.distribute.Strategy.reduce`](https://tensorflow.google.cn/api_docs/python/tf/distribute/Strategy#reduce)的时候合并,然后通过[`tf.distribute.Strategy.reduce`](https://tensorflow.google.cn/api_docs/python/tf/distribute/Strategy#reduce)叠加各个返回值来跨批次。
* 在执行`tf.distribute.Strategy.experimental_run_v2`时,`tf.keras.Metrics`应在`train_step``test_step`中更新。
* `tf.distribute.Strategy.experimental_run_v2`返回策略中每个本地副本的结果,并且有多种方法可以处理此结果。 您可以执行[`tf.distribute.Strategy.reduce`](https://tensorflow.google.cn/api_docs/python/tf/distribute/Strategy#reduce)来获取汇总值。 您还可以执行[`tf.distribute.Strategy.experimental_local_results`](https://tensorflow.google.cn/api_docs/python/tf/distribute/Strategy#experimental_local_results)来获取每个本地副本中结果中包含的值列表。
## 恢复最新的检查点并进行测试
一个模型使用了[`tf.distribute.Strategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/Strategy)的检查点可以使用策略或者不使用策略进行恢复。
```py
eval_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(
name='eval_accuracy')
new_model = create_model()
new_optimizer = tf.keras.optimizers.Adam()
test_dataset = tf.data.Dataset.from_tensor_slices((test_images, test_labels)).batch(GLOBAL_BATCH_SIZE)
```
```py
@tf.function
def eval_step(images, labels):
predictions = new_model(images, training=False)
eval_accuracy(labels, predictions)
```
```py
checkpoint = tf.train.Checkpoint(optimizer=new_optimizer, model=new_model)
checkpoint.restore(tf.train.latest_checkpoint(checkpoint_dir))
for images, labels in test_dataset:
eval_step(images, labels)
print ('Accuracy after restoring the saved model without strategy: {}'.format(
eval_accuracy.result()*100))
```
```py
Accuracy after restoring the saved model without strategy: 90.54000091552734
```
## 迭代一个数据集的替代方法
### 使用迭代器
如果你想要迭代一个已经给定步骤数量而不需要整个遍历的数据集,你可以创建一个迭代器并在迭代器上调用`iter`和显式调用`next`。 您可以选择在 tf.function 内部和外部迭代数据集。 这是一个小片段,演示了使用迭代器在 tf.function 外部迭代数据集。
```py
with strategy.scope():
for _ in range(EPOCHS):
total_loss = 0.0
num_batches = 0
train_iter = iter(train_dist_dataset)
for _ in range(10):
total_loss += distributed_train_step(next(train_iter))
num_batches += 1
average_train_loss = total_loss / num_batches
template = ("Epoch {}, Loss: {}, Accuracy: {}")
print (template.format(epoch+1, average_train_loss, train_accuracy.result()*100))
train_accuracy.reset_states()
```
```py
Epoch 10, Loss: 0.17099234461784363, Accuracy: 93.75
Epoch 10, Loss: 0.12641692161560059, Accuracy: 95.9375
Epoch 10, Loss: 0.11636483669281006, Accuracy: 96.09375
Epoch 10, Loss: 0.1404765546321869, Accuracy: 95.0
Epoch 10, Loss: 0.16838286817073822, Accuracy: 92.5
Epoch 10, Loss: 0.1905607134103775, Accuracy: 93.125
Epoch 10, Loss: 0.12706035375595093, Accuracy: 95.78125
Epoch 10, Loss: 0.14852401614189148, Accuracy: 93.59375
Epoch 10, Loss: 0.11990274488925934, Accuracy: 95.9375
Epoch 10, Loss: 0.1237613782286644, Accuracy: 95.9375
```
### 在 tf.function 中迭代
您还可以使用`for x in ...`构造在 tf.function 内部迭代整个输入`train_dist_dataset`,或者像上面那样创建迭代器。下面的例子演示了在 tf.function 中包装一个 epoch 并在功能内迭代`train_dist_dataset`
```py
with strategy.scope():
@tf.function
def distributed_train_epoch(dataset):
total_loss = 0.0
num_batches = 0
for x in dataset:
per_replica_losses = strategy.experimental_run_v2(train_step,
args=(x,))
total_loss += strategy.reduce(
tf.distribute.ReduceOp.SUM, per_replica_losses, axis=None)
num_batches += 1
return total_loss / tf.cast(num_batches, dtype=tf.float32)
for epoch in range(EPOCHS):
train_loss = distributed_train_epoch(train_dist_dataset)
template = ("Epoch {}, Loss: {}, Accuracy: {}")
print (template.format(epoch+1, train_loss, train_accuracy.result()*100))
train_accuracy.reset_states()
```
```py
Epoch 1, Loss: 0.1545342057943344, Accuracy: 94.34666442871094
Epoch 2, Loss: 0.14368833601474762, Accuracy: 94.76666259765625
Epoch 3, Loss: 0.13302761316299438, Accuracy: 95.22833251953125
Epoch 4, Loss: 0.12302733212709427, Accuracy: 95.51499938964844
Epoch 5, Loss: 0.11504675447940826, Accuracy: 95.7300033569336
Epoch 6, Loss: 0.10611504316329956, Accuracy: 96.02000427246094
Epoch 7, Loss: 0.09776321798563004, Accuracy: 96.3566665649414
Epoch 8, Loss: 0.0923474133014679, Accuracy: 96.54166412353516
Epoch 9, Loss: 0.08583918958902359, Accuracy: 96.85833740234375
Epoch 10, Loss: 0.0784970372915268, Accuracy: 97.12332916259766
```
### 跟踪副本中的训练的损失
注意:作为通用的规则,您应该使用`tf.keras.Metrics`来跟踪每个样本的值以避免它们在副本中合并。
我们 ** 建议使用[`tf.metrics.Mean`](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/Mean) 来跟踪不同副本的训练损失,因为在执行过程中会进行损失缩放计算。
例如,如果您运行具有以下特点的训练作业:
* 两个副本
* 在每个副本上处理两个例子
* 产生的损失值:每个副本为[2,3]和[4,5]
* 全局批次大小 = 4
通过损失缩放,您可以通过添加损失值来计算每个副本上的每个样本的损失值,然后除以全局批量大小。 在这种情况下:`2 + 3/ 4 = 1.25``4 + 5/ 4 = 2.25`
如果您使用 [`tf.metrics.Mean`](https://tensorflow.google.cn/api_docs/python/tf/keras/metrics/Mean) 来跟踪两个副本的损失,结果会有所不同。 在这个例子中,你最终得到一个`total`为 3.50 和`count`为 2 的结果,当调用`result`时,你将得到`total` /`count` = 1.75。 使用`tf.keras.Metrics`计算损失时会通过一个等于同步副本数量的额外因子来缩放。
### 例子和教程
以下是一些使用自定义训练循环来分发策略的示例:
1. [教程](/tutorials/distribute/training_loops) 使用 `MirroredStrategy` 来训练 MNIST 。
2. [DenseNet](https://github.com/tensorflow/examples/blob/master/tensorflow_examples/models/densenet/distributed_train.py) 使用 `MirroredStrategy`的例子。
3. [BERT](https://github.com/tensorflow/models/blob/master/official/nlp/bert/run_classifier.py) 使用 `MirroredStrategy``TPUStrategy`来训练的例子。 此示例对于了解如何在分发训练过程中如何载入一个检测点和定期生成检查点特别有帮助。
4. [NCF](https://github.com/tensorflow/models/blob/master/official/recommendation/ncf_keras_main.py) 使用 `MirroredStrategy` 来启用 `keras_use_ctl` 标记。
5. [NMT](https://github.com/tensorflow/examples/blob/master/tensorflow_examples/models/nmt_with_attention/distributed_train.py) 使用 `MirroredStrategy`来训练的例子。
更多的例子列在 [分发策略指南](https://tensorflow.google.cn/guide/distribute_strategy#examples_and_tutorials)。
## 下一步
在你的模型上尝试新的[`tf.distribute.Strategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/Strategy) API。

View File

@@ -1,265 +0,0 @@
# 利用 Keras 来训练多工作器worker
> 原文:[https://tensorflow.google.cn/tutorials/distribute/multi_worker_with_keras](https://tensorflow.google.cn/tutorials/distribute/multi_worker_with_keras)
**Note:** 我们的 TensorFlow 社区翻译了这些文档。因为社区翻译是尽力而为, 所以无法保证它们是最准确的,并且反映了最新的 [官方英文文档](https://tensorflow.google.cn/?hl=en)。如果您有改进此翻译的建议, 请提交 pull request 到 [tensorflow/docs](https://github.com/tensorflow/docs) GitHub 仓库。要志愿地撰写或者审核译文,请加入 [docs-zh-cn@tensorflow.org Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs-zh-cn)。
## 概述
本教程使用 [`tf.distribute.Strategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/Strategy) API 演示了使用 Keras 模型的多工作器worker分布式培训。借助专为多工作器worker训练而设计的策略设计在单一工作器worker上运行的 Keras 模型可以在最少的代码更改的情况下无缝地处理多个工作器。
[TensorFlow 中的分布式培训](https://tensorflow.google.cn/guide/distribute_strategy)指南可用于概述 TensorFlow 支持的分布式策略,并想要更深入理解[`tf.distribute.Strategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/Strategy) API 感兴趣的人。
## 配置
首先,设置 TensorFlow 和必要的导入。
```py
!pip install -q tf-nightly
import tensorflow_datasets as tfds
import tensorflow as tf
tfds.disable_progress_bar()
```
```py
WARNING: You are using pip version 20.2.2; however, version 20.2.3 is available.
You should consider upgrading via the '/tmpfs/src/tf_docs_env/bin/python -m pip install --upgrade pip' command.
```
## 准备数据集
现在,让我们从 [TensorFlow 数据集](https://tensorflow.google.cn/datasets) 中准备 MNIST 数据集。 [MNIST 数据集](http://yann.lecun.com/exdb/mnist/) 包括 60,000 个训练样本和 10,000 个手写数字 0-9 的测试示例,格式为 28x28 像素单色图像。
```py
BUFFER_SIZE = 10000
BATCH_SIZE = 64
def make_datasets_unbatched():
# 将 MNIST 数据从 (0, 255] 缩放到 (0., 1.]
def scale(image, label):
image = tf.cast(image, tf.float32)
image /= 255
return image, label
datasets, info = tfds.load(name='mnist',
with_info=True,
as_supervised=True)
return datasets['train'].map(scale).cache().shuffle(BUFFER_SIZE)
train_datasets = make_datasets_unbatched().batch(BATCH_SIZE)
```
## 构建 Keras 模型
在这里,我们使用[`tf.keras.Sequential`](https://tensorflow.google.cn/api_docs/python/tf/keras/Sequential) API 来构建和编译一个简单的卷积神经网络 Keras 模型,用我们的 MNIST 数据集进行训练。
注意:有关构建 Keras 模型的详细训练说明,请参阅[TensorFlow Keras 指南](https://tensorflow.google.cn/guide/keras#sequential_model)。
```py
def build_and_compile_cnn_model():
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10)
])
model.compile(
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=tf.keras.optimizers.SGD(learning_rate=0.001),
metrics=['accuracy'])
return model
```
让我们首先尝试用少量的 epoch 来训练模型并在单个工作器worker中观察结果以确保一切正常。 随着训练的迭代您应该会看到损失loss下降和准确度accuracy接近 1.0。
```py
single_worker_model = build_and_compile_cnn_model()
single_worker_model.fit(x=train_datasets, epochs=3, steps_per_epoch=5)
```
```py
Epoch 1/3
5/5 [==============================] - 1s 15ms/step - loss: 2.3390 - accuracy: 0.0211
Epoch 2/3
5/5 [==============================] - 0s 14ms/step - loss: 2.3315 - accuracy: 0.0368
Epoch 3/3
5/5 [==============================] - 0s 13ms/step - loss: 2.3271 - accuracy: 0.0484
<tensorflow.python.keras.callbacks.History at 0x7fb5d055e358>
```
## 多工作器worker配置
现在让我们进入多工作器worker)训练的世界。在 TensorFlow 中,需要 `TF_CONFIG` 环境变量来训练多台机器,每台机器可能具有不同的角色。 `TF_CONFIG`用于指定作为集群一部分的每个 worker 的集群配置。
`TF_CONFIG` 有两个组件:`cluster``task``cluster` 提供有关训练集群的信息,这是一个由不同类型的工作组成的字典,例如 `worker` 。在多工作器worker培训中除了常规的“工作器”之外通常还有一个“工人”承担更多责任比如保存检查点和为 TensorBoard 编写摘要文件。这样的工作器worker被称为“主要”工作者习惯上`worker``index` 0 被指定为主要的 `worker`(事实上这就是[`tf.distribute.Strategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/Strategy)的实现方式)。 另一方面,`task` 提供当前任务的信息。
在这个例子中,我们将任务 `type` 设置为 `"worker"` 并将任务 `index` 设置为 `0` 。这意味着具有这种设置的机器是第一个工作器,它将被指定为主要工作器并且要比其他工作器做更多的工作。请注意,其他机器也需要设置 `TF_CONFIG` 环境变量,它应该具有相同的 `cluster` 字典,但是不同的任务`type``index` 取决于这些机器的角色。
为了便于说明,本教程展示了如何在 `localhost` 上设置一个带有 2 个工作器的`TF_CONFIG`。 实际上,用户会在外部 IP 地址/端口上创建多个工作器,并在每个工作器上适当地设置`TF_CONFIG`
警告:不要在 Colab 中执行以下代码。TensorFlow 的运行时将尝试在指定的 IP 地址和端口创建 gRPC 服务器,这可能会失败。
```py
os.environ['TF_CONFIG'] = json.dumps({
'cluster': {
'worker': ["localhost:12345", "localhost:23456"]
},
'task': {'type': 'worker', 'index': 0}
})
```
注意,虽然在该示例中学习速率是固定的,但是通常可能需要基于全局批量大小来调整学习速率。
## 选择正确的策略
在 TensorFlow 中,分布式训练包括同步训练(其中训练步骤跨工作器和副本同步)、异步训练(训练步骤未严格同步)。
`MultiWorkerMirroredStrategy` 是同步多工作器训练的推荐策略,将在本指南中进行演示。
要训练模型,请使用 [`tf.distribute.experimental.MultiWorkerMirroredStrategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy) 的实例。 `MultiWorkerMirroredStrategy` 在所有工作器的每台设备上创建模型层中所有变量的副本。 它使用 `CollectiveOps` ,一个用于集体通信的 TensorFlow 操作,来聚合梯度并使变量保持同步。 [`tf.distribute.Strategy`指南](https://tensorflow.google.cn/guide/distribute_strategy)有关于此策略的更多详细信息。
```py
strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()
```
```py
WARNING:tensorflow:Collective ops is not configured at program startup. Some performance features may not be enabled.
Warning:tensorflow:Collective ops is not configured at program startup. Some performance features may not be enabled.
INFO:tensorflow:Using MirroredStrategy with devices ('/device:CPU:0',)
INFO:tensorflow:Using MirroredStrategy with devices ('/device:CPU:0',)
INFO:tensorflow:Single-worker MultiWorkerMirroredStrategy with local_devices = ('/device:CPU:0',), communication = CollectiveCommunication.AUTO
INFO:tensorflow:Single-worker MultiWorkerMirroredStrategy with local_devices = ('/device:CPU:0',), communication = CollectiveCommunication.AUTO
```
注意:解析 `TF_CONFIG` 并且在调用 [`MultiWorkerMirroredStrategy.**init**()`](https://tensorflow.google.cn/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy#__init__) 时启动 TensorFlow 的 GRPC 服务器,因此必须在创建[`tf.distribute.Strategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/Strategy)实例之前设置 `TF_CONFIG` 环境变量。
`MultiWorkerMirroredStrategy` 通过[`CollectiveCommunication`](https://github.com/tensorflow/tensorflow/blob/a385a286a930601211d78530734368ccb415bee4/tensorflow/python/distribute/cross_device_ops.py#L928)参数提供多个实现。`RING` 使用 gRPC 作为跨主机通信层实现基于环的集合。`NCCL` 使用[Nvidia 的 NCCL](https://developer.nvidia.com/nccl)来实现集体。 `AUTO` 将选择推迟到运行时。 集体实现的最佳选择取决于 GPU 的数量和种类以及群集中的网络互连。
## 使用 MultiWorkerMirroredStrategy 训练模型
通过将 [`tf.distribute.Strategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/Strategy) API 集成到 [`tf.keras`](https://tensorflow.google.cn/api_docs/python/tf/keras) 中,将训练分发给多人的唯一更改就是将模型进行构建和 `model.compile()` 调用封装在 `strategy.scope()` 内部。 分发策略的范围决定了如何创建变量以及在何处创建变量,对于 MultiWorkerMirroredStrategy 而言,创建的变量为 MirroredVariable ,并且将它们复制到每个工作器上。
注意:在此 Colab 中,以下代码可以按预期结果运行,但是由于未设置`TF_CONFIG`,因此这实际上是单机训练。 在您自己的示例中设置了 `TF_CONFIG` 后,您应该期望在多台机器上进行培训可以提高速度。
```py
NUM_WORKERS = 2
# 由于 `tf.data.Dataset.batch` 需要全局的批处理大小,
# 因此此处的批处理大小按工作器数量增加。
# 以前我们使用 64现在变成 128。
GLOBAL_BATCH_SIZE = 64 * NUM_WORKERS
# 创建数据集需要在 MultiWorkerMirroredStrategy 对象
# 实例化后。
train_datasets = make_datasets_unbatched().batch(GLOBAL_BATCH_SIZE)
with strategy.scope():
# 模型的建立/编译需要在 `strategy.scope()` 内部。
multi_worker_model = build_and_compile_cnn_model()
# Keras 的 `model.fit()` 以特定的时期数和每时期的步数训练模型。
# 注意此处的数量仅用于演示目的,并不足以产生高质量的模型。
multi_worker_model.fit(x=train_datasets, epochs=3, steps_per_epoch=5)
```
```py
Epoch 1/3
5/5 [==============================] - 3s 23ms/step - loss: 2.3042 - accuracy: 0.1243
Epoch 2/3
5/5 [==============================] - 0s 18ms/step - loss: 2.3129 - accuracy: 0.0801
Epoch 3/3
5/5 [==============================] - 0s 19ms/step - loss: 2.2974 - accuracy: 0.1253
<tensorflow.python.keras.callbacks.History at 0x7fb5a03fd828>
```
### 数据集分片和批batch大小
在多工作器训练中,需要将数据分片为多个部分,以确保融合和性能。 但是,请注意,在上面的代码片段中,数据集直接发送到`model.fit`,而无需分片; 这是因为[`tf.distribute.Strategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/Strategy) API 在多工作器训练中会自动处理数据集分片。
如果您喜欢手动分片进行训练,则可以通过[`tf.data.experimental.DistributeOptions`](https://tensorflow.google.cn/api_docs/python/tf/data/experimental/DistributeOptions) API 关闭自动分片。
```py
options = tf.data.Options()
options.experimental_distribute.auto_shard_policy = tf.data.experimental.AutoShardPolicy.OFF
train_datasets_no_auto_shard = train_datasets.with_options(options)
```
要注意的另一件事是 `datasets` 的批处理大小。 在上面的代码片段中,我们使用 `GLOBAL_BATCH_SIZE = 64 * NUM_WORKERS` ,这是单个工作器的大小的 `NUM_WORKERS` 倍,因为每个工作器的有效批量大小是全局批量大小(参数从 [`tf.data.Dataset.batch()`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#batch) 传入)除以工作器的数量,通过此更改,我们使每个工作器的批处理大小与以前相同。
## 性能
现在,您已经有了一个 Keras 模型,该模型全部通过 `MultiWorkerMirroredStrategy` 运行在多个工作器中。 您可以尝试以下技术来调整多工作器训练的效果。
* `MultiWorkerMirroredStrategy` 提供了多个[集体通信实现][collective communication implementations](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/distribute/cross_device_ops.py). `RING` 使用 gRPC 作为跨主机通信层实现基于环的集合。 `NCCL` 使用 [Nvidia's NCCL](https://developer.nvidia.com/nccl) 来实现集合。 `AUTO` 将推迟到运行时选择。集体实施的最佳选择取决于 GPU 的数量和种类以及集群中的网络互连。 要覆盖自动选择,请为 `MultiWorkerMirroredStrategy` 的构造函数的 `communication` 参数指定一个有效值,例如: `communication=tf.distribute.experimental.CollectiveCommunication.NCCL`.
* 如果可能的话,将变量强制转换为 `tf.float`。ResNet 的官方模型包括如何完成此操作的[示例](https://github.com/tensorflow/models/blob/8367cf6dabe11adf7628541706b660821f397dce/official/resnet/resnet_model.py#L466)。
## 容错能力
在同步训练中,如果其中一个工作器出现故障并且不存在故障恢复机制,则集群将失败。 在工作器退出或不稳定的情况下,将 Keras 与 [`tf.distribute.Strategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/Strategy) 一起使用会具有容错的优势。 我们通过在您选择的分布式文件系统中保留训练状态来做到这一点,以便在重新启动先前失败或被抢占的实例后,将恢复训练状态。
由于所有工作器在训练 epochs 和 steps 方面保持同步,因此其他工作器将需要等待失败或被抢占的工作器重新启动才能继续。
### ModelCheckpoint 回调
要在多工作器训练中利用容错功能,请在调用 [`tf.keras.Model.fit()`](https://tensorflow.google.cn/api_docs/python/tf/keras/Model#fit) 时提供一个 [`tf.keras.callbacks.ModelCheckpoint`](https://tensorflow.google.cn/api_docs/python/tf/keras/callbacks/ModelCheckpoint) 实例。 回调会将检查点和训练状态存储在与 `ModelCheckpoint``filepath` 参数相对应的目录中。
```py
# 将 `filepath` 参数替换为在文件系统中所有工作器都能访问的路径。
callbacks = [tf.keras.callbacks.ModelCheckpoint(filepath='/tmp/keras-ckpt')]
with strategy.scope():
multi_worker_model = build_and_compile_cnn_model()
multi_worker_model.fit(x=train_datasets,
epochs=3,
steps_per_epoch=5,
callbacks=callbacks)
```
```py
Epoch 1/3
4/5 [=======================>......] - ETA: 0s - loss: 2.2830 - accuracy: 0.1810
/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py:2289: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
warnings.warn('`Model.state_updates` will be removed in a future version. '
/tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py:1377: UserWarning: `layer.updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
warnings.warn('`layer.updates` will be removed in a future version. '
INFO:tensorflow:Assets written to: /tmp/keras-ckpt/assets
INFO:tensorflow:Assets written to: /tmp/keras-ckpt/assets
5/5 [==============================] - 4s 170ms/step - loss: 2.2852 - accuracy: 0.1790
Epoch 2/3
4/5 [=======================>......] - ETA: 0s - loss: 2.2871 - accuracy: 0.1758INFO:tensorflow:Assets written to: /tmp/keras-ckpt/assets
INFO:tensorflow:Assets written to: /tmp/keras-ckpt/assets
5/5 [==============================] - 1s 155ms/step - loss: 2.2869 - accuracy: 0.1797
Epoch 3/3
4/5 [=======================>......] - ETA: 0s - loss: 2.2876 - accuracy: 0.2041INFO:tensorflow:Assets written to: /tmp/keras-ckpt/assets
INFO:tensorflow:Assets written to: /tmp/keras-ckpt/assets
5/5 [==============================] - 1s 155ms/step - loss: 2.2872 - accuracy: 0.2064
<tensorflow.python.keras.callbacks.History at 0x7fb5a03fd668>
```
如果某个工作线程被抢占,则整个集群将暂停,直到重新启动被抢占的工作线程为止。工作器重新加入集群后,其他工作器也将重新启动。 现在,每个工作器都将读取先前保存的检查点文件,并获取其以前的状态,从而使群集能够恢复同步,然后继续训练。
如果检查包含在`ModelCheckpoint` 中指定的 `filepath` 的目录,则可能会注意到一些临时生成的检查点文件。 这些文件是恢复以前丢失的实例所必需的,并且在成功退出多工作器训练后,这些文件将在 [`tf.keras.Model.fit()`](https://tensorflow.google.cn/api_docs/python/tf/keras/Model#fit) 的末尾被库删除。
## 您可以查阅
1. [Distributed Training in TensorFlow](https://tensorflow.google.cn/guide/distribute_strategy) 该指南概述了可用的分布式策略。
2. [ResNet50](https://github.com/tensorflow/models/blob/master/official/resnet/imagenet_main.py) 官方模型,该模型可以使用 `MirroredStrategy``MultiWorkerMirroredStrategy` 进行训练

View File

@@ -1,427 +0,0 @@
# 利用 Estimator 进行多工作器训练
> 原文:[https://tensorflow.google.cn/tutorials/distribute/multi_worker_with_estimator](https://tensorflow.google.cn/tutorials/distribute/multi_worker_with_estimator)
**Note:** 我们的 TensorFlow 社区翻译了这些文档。因为社区翻译是尽力而为, 所以无法保证它们是最准确的,并且反映了最新的 [官方英文文档](https://tensorflow.google.cn/?hl=en)。如果您有改进此翻译的建议, 请提交 pull request 到 [tensorflow/docs](https://github.com/tensorflow/docs) GitHub 仓库。要志愿地撰写或者审核译文,请加入 [docs-zh-cn@tensorflow.org Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs-zh-cn)。
## 概述
本教程展示了在训练分布式多工作器worker如何使用 [`tf.distribute.Strategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/Strategy)。如果你的代码使用了 [`tf.estimator`](https://tensorflow.google.cn/api_docs/python/tf/estimator),而且你也对拓展单机以获取高性能有兴趣,那么这个教程就是为你准备的。
在开始之前,请先阅读 [`tf.distribute.Strategy` 指南](https://tensorflow.google.cn/guide/distribute_strategy)。同样相关的还有 [使用多 GPU 训练教程](https://tensorflow.google.cn/tutorials/distribute/keras),因为在这个教程里也使用了相同的模型。
## 创建
首先,设置好 TensorFlow 以及将会用到的输入模块。
```py
import tensorflow_datasets as tfds
import tensorflow as tf
tfds.disable_progress_bar()
import os, json
```
## 输入函数
本教程里我们使用的是 [TensorFlow 数据集TensorFlow Datasets](https://tensorflow.google.cn/datasets)里的 MNIST 数据集。本教程里的代码和 [使用多 GPU 训练教程](https://tensorflow.google.cn/tutorials/distribute/keras) 类似,但有一个主要区别:当我们使用 Estimator 进行多工作器训练时,需要根据工作器的数量对数据集进行拆分,以确保模型收敛。输入的数据根据工作器其自身的索引来拆分,因此每个工作器各自负责处理该数据集 `1/num_workers` 个不同部分。
```py
BUFFER_SIZE = 10000
BATCH_SIZE = 64
def input_fn(mode, input_context=None):
datasets, info = tfds.load(name='mnist',
with_info=True,
as_supervised=True)
mnist_dataset = (datasets['train'] if mode == tf.estimator.ModeKeys.TRAIN else
datasets['test'])
def scale(image, label):
image = tf.cast(image, tf.float32)
image /= 255
return image, label
if input_context:
mnist_dataset = mnist_dataset.shard(input_context.num_input_pipelines,
input_context.input_pipeline_id)
return mnist_dataset.map(scale).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
```
使模型收敛的另一种合理方式是在每个工作器上设置不同的随机种子,然后对数据集进行随机重排。
## 多工作器配置
本教程主要的不同(区别于[使用多 GPU 训练教程](https://tensorflow.google.cn/tutorials/distribute/keras))在于多工作器的创建。明确集群中每个工作器的配置的标准方式是设置环境变量 `TF_CONFIG`
`TF_CONFIG` 里包括了两个部分:`cluster``task``cluster` 提供了关于整个集群的信息也就是集群中的工作器和参数服务器parameter server`task` 提供了关于当前任务的信息。在本例中任务的类型type是 worker 且该任务的索引index是 0。
出于演示的目的,本教程展示了怎么将 `TF_CONFIG` 设置成两个本地的工作器。在实践中,你可以在外部的 IP 地址和端口上创建多个工作器,并为每个工作器正确地配置好 `TF_CONFIG` 变量,也就是更改任务的索引。
警告:不要在 Colab 里执行以下代码。TensorFlow 的运行程序会试图在指定的 IP 地址和端口创建 gRPC 服务器,这会导致创建失败。
```py
os.environ['TF_CONFIG'] = json.dumps({
'cluster': {
'worker': ["localhost:12345", "localhost:23456"]
},
'task': {'type': 'worker', 'index': 0}
})
```
## 定义模型
定义训练中用到的层,优化器和损失函数。本教程使用 Keras layers 定义模型,同[使用多 GPU 训练教程](https://tensorflow.google.cn/tutorials/distribute/keras)类似。
```py
LEARNING_RATE = 1e-4
def model_fn(features, labels, mode):
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
logits = model(features, training=False)
if mode == tf.estimator.ModeKeys.PREDICT:
predictions = {'logits': logits}
return tf.estimator.EstimatorSpec(labels=labels, predictions=predictions)
optimizer = tf.compat.v1.train.GradientDescentOptimizer(
learning_rate=LEARNING_RATE)
loss = tf.keras.losses.SparseCategoricalCrossentropy(
from_logits=True, reduction=tf.keras.losses.Reduction.NONE)(labels, logits)
loss = tf.reduce_sum(loss) * (1\. / BATCH_SIZE)
if mode == tf.estimator.ModeKeys.EVAL:
return tf.estimator.EstimatorSpec(mode, loss=loss)
return tf.estimator.EstimatorSpec(
mode=mode,
loss=loss,
train_op=optimizer.minimize(
loss, tf.compat.v1.train.get_or_create_global_step()))
```
注意:尽管在本例中学习率是固定的,但是通常情况下可能有必要基于全局的批次大小对学习率进行调整。
## MultiWorkerMirroredStrategy
为训练模型,需要使用 [`tf.distribute.experimental.MultiWorkerMirroredStrategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy) 实例。`MultiWorkerMirroredStrategy` 创建了每个设备中模型层里所有变量的拷贝,且是跨工作器的。其用到了 `CollectiveOps`,这是 TensorFlow 里的一种操作,用来整合梯度以及确保变量同步。该策略的更多细节可以在 [`tf.distribute.Strategy` 指南](https://tensorflow.google.cn/guide/distribute_strategy)中找到。
```py
strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()
```
```py
INFO:tensorflow:Using MirroredStrategy with devices ('/device:GPU:0',)
INFO:tensorflow:Single-worker MultiWorkerMirroredStrategy with local_devices = ('/device:GPU:0',), communication = CollectiveCommunication.AUTO
```
## 训练和评估模型
接下来,在 `RunConfig` 中为 estimator 指明分布式策略,同时通过调用 [`tf.estimator.train_and_evaluate`](https://tensorflow.google.cn/api_docs/python/tf/estimator/train_and_evaluate) 训练和评估模型。本教程只通过指明 `train_distribute` 进行分布式训练。但是也同样也可以通过指明 `eval_distribute` 来进行分布式评估。
```py
config = tf.estimator.RunConfig(train_distribute=strategy)
classifier = tf.estimator.Estimator(
model_fn=model_fn, model_dir='/tmp/multiworker', config=config)
tf.estimator.train_and_evaluate(
classifier,
train_spec=tf.estimator.TrainSpec(input_fn=input_fn),
eval_spec=tf.estimator.EvalSpec(input_fn=input_fn)
)
```
```py
INFO:tensorflow:Initializing RunConfig with distribution strategies.
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/multiworker', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': <tensorflow.python.distribute.collective_all_reduce_strategy.CollectiveAllReduceStrategy object at 0x7f975c17f5f8>, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_distribute_coordinator_mode': None}
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
INFO:tensorflow:The `input_fn` accepts an `input_context` which will be given by DistributionStrategy
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py:339: get_next_as_optional (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Iterator.get_next_as_optional()` instead.
Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py:339: get_next_as_optional (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Iterator.get_next_as_optional()` instead.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
Warning:tensorflow:AutoGraph could not transform <function _combine_distributed_scaffold.<locals>.<lambda> at 0x7f975c181c80> and will run it as-is.
Cause: could not parse the source code:
lambda scaffold: scaffold.ready_op, args=(grouped_scaffold,))
This error may be avoided by creating the lambda in a standalone statement.
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
Warning:tensorflow:AutoGraph could not transform <function _combine_distributed_scaffold.<locals>.<lambda> at 0x7f975c181c80> and will run it as-is.
Cause: could not parse the source code:
lambda scaffold: scaffold.ready_op, args=(grouped_scaffold,))
This error may be avoided by creating the lambda in a standalone statement.
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
Warning: AutoGraph could not transform <function _combine_distributed_scaffold.<locals>.<lambda> at 0x7f975c181c80> and will run it as-is.
Cause: could not parse the source code:
lambda scaffold: scaffold.ready_op, args=(grouped_scaffold,))
This error may be avoided by creating the lambda in a standalone statement.
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Create CheckpointSaverHook.
Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/util.py:96: DistributedIteratorV1.initialize (from tensorflow.python.distribute.input_lib) is deprecated and will be removed in a future version.
Instructions for updating:
Use the iterator's `initializer` property instead.
Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/util.py:96: DistributedIteratorV1.initialize (from tensorflow.python.distribute.input_lib) is deprecated and will be removed in a future version.
Instructions for updating:
Use the iterator's `initializer` property instead.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmp/multiworker/model.ckpt.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/multiworker/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 2.3033497, step = 0
INFO:tensorflow:loss = 2.3033497, step = 0
INFO:tensorflow:global_step/sec: 195.373
INFO:tensorflow:global_step/sec: 195.373
INFO:tensorflow:loss = 2.3039753, step = 100 (0.514 sec)
INFO:tensorflow:loss = 2.3039753, step = 100 (0.514 sec)
INFO:tensorflow:global_step/sec: 214.711
INFO:tensorflow:global_step/sec: 214.711
INFO:tensorflow:loss = 2.3031363, step = 200 (0.465 sec)
INFO:tensorflow:loss = 2.3031363, step = 200 (0.465 sec)
INFO:tensorflow:global_step/sec: 217.488
INFO:tensorflow:global_step/sec: 217.488
INFO:tensorflow:loss = 2.3034592, step = 300 (0.460 sec)
INFO:tensorflow:loss = 2.3034592, step = 300 (0.460 sec)
INFO:tensorflow:global_step/sec: 218.917
INFO:tensorflow:global_step/sec: 218.917
INFO:tensorflow:loss = 2.3013198, step = 400 (0.457 sec)
INFO:tensorflow:loss = 2.3013198, step = 400 (0.457 sec)
INFO:tensorflow:global_step/sec: 219.726
INFO:tensorflow:global_step/sec: 219.726
INFO:tensorflow:loss = 2.3037362, step = 500 (0.455 sec)
INFO:tensorflow:loss = 2.3037362, step = 500 (0.455 sec)
INFO:tensorflow:global_step/sec: 219.401
INFO:tensorflow:global_step/sec: 219.401
INFO:tensorflow:loss = 2.3062348, step = 600 (0.455 sec)
INFO:tensorflow:loss = 2.3062348, step = 600 (0.455 sec)
INFO:tensorflow:global_step/sec: 220.068
INFO:tensorflow:global_step/sec: 220.068
INFO:tensorflow:loss = 2.300187, step = 700 (0.455 sec)
INFO:tensorflow:loss = 2.300187, step = 700 (0.455 sec)
INFO:tensorflow:global_step/sec: 246.384
INFO:tensorflow:global_step/sec: 246.384
INFO:tensorflow:loss = 2.30475, step = 800 (0.405 sec)
INFO:tensorflow:loss = 2.30475, step = 800 (0.405 sec)
INFO:tensorflow:global_step/sec: 587.13
INFO:tensorflow:global_step/sec: 587.13
INFO:tensorflow:loss = 2.3031988, step = 900 (0.170 sec)
INFO:tensorflow:loss = 2.3031988, step = 900 (0.170 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 938...
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 938...
INFO:tensorflow:Saving checkpoints for 938 into /tmp/multiworker/model.ckpt.
INFO:tensorflow:Saving checkpoints for 938 into /tmp/multiworker/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 938...
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 938...
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-09-22T19:53:28Z
INFO:tensorflow:Starting evaluation at 2020-09-22T19:53:28Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/multiworker/model.ckpt-938
INFO:tensorflow:Restoring parameters from /tmp/multiworker/model.ckpt-938
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [10/100]
INFO:tensorflow:Evaluation [10/100]
INFO:tensorflow:Evaluation [20/100]
INFO:tensorflow:Evaluation [20/100]
INFO:tensorflow:Evaluation [30/100]
INFO:tensorflow:Evaluation [30/100]
INFO:tensorflow:Evaluation [40/100]
INFO:tensorflow:Evaluation [40/100]
INFO:tensorflow:Evaluation [50/100]
INFO:tensorflow:Evaluation [50/100]
INFO:tensorflow:Evaluation [60/100]
INFO:tensorflow:Evaluation [60/100]
INFO:tensorflow:Evaluation [70/100]
INFO:tensorflow:Evaluation [70/100]
INFO:tensorflow:Evaluation [80/100]
INFO:tensorflow:Evaluation [80/100]
INFO:tensorflow:Evaluation [90/100]
INFO:tensorflow:Evaluation [90/100]
INFO:tensorflow:Evaluation [100/100]
INFO:tensorflow:Evaluation [100/100]
INFO:tensorflow:Inference Time : 0.98988s
INFO:tensorflow:Inference Time : 0.98988s
INFO:tensorflow:Finished evaluation at 2020-09-22-19:53:29
INFO:tensorflow:Finished evaluation at 2020-09-22-19:53:29
INFO:tensorflow:Saving dict for global step 938: global_step = 938, loss = 2.3031592
INFO:tensorflow:Saving dict for global step 938: global_step = 938, loss = 2.3031592
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 938: /tmp/multiworker/model.ckpt-938
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 938: /tmp/multiworker/model.ckpt-938
INFO:tensorflow:Loss for final step: 1.1519132.
INFO:tensorflow:Loss for final step: 1.1519132.
({'loss': 2.3031592, 'global_step': 938}, [])
```
# 优化训练后的模型性能
现在你已经有了由 [`tf.distribute.Strategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/Strategy) 的模型和能支持多工作器的 Estimator。你可以尝试使用下列技巧来优化多工作器训练的性能。
* *增加单批次的大小:* 此处的批次大小指的是每个 GPU 上的批次大小。通常来说,最大的批次大小应该适应 GPU 的内存大小。
* *变量转换:* 尽可能将变量转换成 `tf.float`。官方的 ResNet 模型包括了如何完成的[样例](https://github.com/tensorflow/models/blob/8367cf6dabe11adf7628541706b660821f397dce/official/resnet/resnet_model.py#L466)。
* *使用集群通信:* `MultiWorkerMirroredStrategy` 提供了好几种[集群通信的实现](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/distribute/cross_device_ops.py).
* `RING` 实现了基于环状的集群,使用了 gRPC 作为跨主机通讯层。
* `NCCL` 使用了 [英伟达的 NCCL](https://developer.nvidia.com/nccl) 来实现集群。
* `AUTO` 将选择延后至运行时。
集群实现的最优选择不仅基于 GPU 的数量和种类,也基于集群间的通信网络。想要覆盖自动的选项,需要指明 `MultiWorkerMirroredStrategy` 的构造器里的 `communication` 参数,例如让 `communication=tf.distribute.experimental.CollectiveCommunication.NCCL`
## 更多的代码示例
1. [端到端的示例](https://github.com/tensorflow/ecosystem/tree/master/distribution_strategy)里使用了 Kubernetes 模板。在这个例子里我们一开始使用了 Keras 模型,并使用了 [`tf.keras.estimator.model_to_estimator`](https://tensorflow.google.cn/api_docs/python/tf/keras/estimator/model_to_estimator) API 将其转换成了 Estimator。
2. 官方的 [ResNet50](https://github.com/tensorflow/models/blob/master/official/resnet/imagenet_main.py) 模型,我们可以使用 `MirroredStrategy``MultiWorkerMirroredStrategy` 来训练它。

View File

@@ -1,524 +0,0 @@
# 使用分布策略保存和加载模型
> 原文:[https://tensorflow.google.cn/tutorials/distribute/save_and_load](https://tensorflow.google.cn/tutorials/distribute/save_and_load)
## 概述
在训练期间一般需要保存和加载模型。有两组用于保存和加载 Keras 模型的 API高级 API 和低级 API。本教程演示了在使用 [`tf.distribute.Strategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/Strategy) 时如何使用 SavedModel API。要了解 SavedModel 和序列化的相关概况,请参阅[保存的模型指南](https://tensorflow.google.cn/guide/saved_model)和 [Keras 模型序列化指南](https://tensorflow.google.cn/guide/keras/save_and_serialize)。让我们从一个简单的示例开始:
导入依赖项:
```py
import tensorflow_datasets as tfds
import tensorflow as tf
tfds.disable_progress_bar()
```
使用 [`tf.distribute.Strategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/Strategy) 准备数据和模型:
```py
mirrored_strategy = tf.distribute.MirroredStrategy()
def get_data():
datasets, ds_info = tfds.load(name='mnist', with_info=True, as_supervised=True)
mnist_train, mnist_test = datasets['train'], datasets['test']
BUFFER_SIZE = 10000
BATCH_SIZE_PER_REPLICA = 64
BATCH_SIZE = BATCH_SIZE_PER_REPLICA * mirrored_strategy.num_replicas_in_sync
def scale(image, label):
image = tf.cast(image, tf.float32)
image /= 255
return image, label
train_dataset = mnist_train.map(scale).cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
eval_dataset = mnist_test.map(scale).batch(BATCH_SIZE)
return train_dataset, eval_dataset
def get_model():
with mirrored_strategy.scope():
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10)
])
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=tf.keras.optimizers.Adam(),
metrics=['accuracy'])
return model
```
```py
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
```
训练模型:
```py
model = get_model()
train_dataset, eval_dataset = get_data()
model.fit(train_dataset, epochs=2)
```
```py
Epoch 1/2
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py:601: get_next_as_optional (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Iterator.get_next_as_optional()` instead.
Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py:601: get_next_as_optional (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Iterator.get_next_as_optional()` instead.
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
938/938 [==============================] - 4s 5ms/step - loss: 0.1971 - accuracy: 0.9421
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
Epoch 2/2
938/938 [==============================] - 3s 3ms/step - loss: 0.0662 - accuracy: 0.9801
<tensorflow.python.keras.callbacks.History at 0x7f96501659e8>
```
## 保存和加载模型
现在,您已经有一个简单的模型可供使用,让我们了解一下如何保存/加载 API。有两组可用的 API
* 高级 Keras `model.save` 和 [`tf.keras.models.load_model`](https://tensorflow.google.cn/api_docs/python/tf/keras/models/load_model)
* 低级 [`tf.saved_model.save`](https://tensorflow.google.cn/api_docs/python/tf/saved_model/save) 和 [`tf.saved_model.load`](https://tensorflow.google.cn/api_docs/python/tf/saved_model/load)
### Keras API
以下为使用 Keras API 保存和加载模型的示例:
```py
keras_model_path = "/tmp/keras_save"
model.save(keras_model_path) # save() should be called out of strategy scope
```
```py
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/tracking/tracking.py:111: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/tracking/tracking.py:111: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/tracking/tracking.py:111: Layer.updates (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/tracking/tracking.py:111: Layer.updates (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
INFO:tensorflow:Assets written to: /tmp/keras_save/assets
INFO:tensorflow:Assets written to: /tmp/keras_save/assets
```
恢复无 [`tf.distribute.Strategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/Strategy) 的模型:
```py
restored_keras_model = tf.keras.models.load_model(keras_model_path)
restored_keras_model.fit(train_dataset, epochs=2)
```
```py
Epoch 1/2
938/938 [==============================] - 3s 3ms/step - loss: 0.0480 - accuracy: 0.0990
Epoch 2/2
938/938 [==============================] - 2s 2ms/step - loss: 0.0334 - accuracy: 0.0989
<tensorflow.python.keras.callbacks.History at 0x7f96c54d0a58>
```
恢复模型后,您可以继续在它上面进行训练,甚至无需再次调用 `compile()`,因为在保存之前已经对其进行了编译。模型以 TensorFlow 的标准 `SavedModel` proto 格式保存。有关更多信息,请参阅 [`saved_model` 格式指南](https://tensorflow.google.cn/guide/saved_model)。
现在,加载模型并使用 [`tf.distribute.Strategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/Strategy) 进行训练:
```py
another_strategy = tf.distribute.OneDeviceStrategy("/cpu:0")
with another_strategy.scope():
restored_keras_model_ds = tf.keras.models.load_model(keras_model_path)
restored_keras_model_ds.fit(train_dataset, epochs=2)
```
```py
Epoch 1/2
938/938 [==============================] - 9s 9ms/step - loss: 0.0481 - accuracy: 0.0989
Epoch 2/2
938/938 [==============================] - 9s 9ms/step - loss: 0.0329 - accuracy: 0.0990
```
如您所见, [`tf.distribute.Strategy`](https://tensorflow.google.cn/api_docs/python/tf/distribute/Strategy) 可以按预期进行加载。此处使用的策略不必与保存前所用策略相同。
### [`tf.saved_model`](https://tensorflow.google.cn/api_docs/python/tf/saved_model) API
现在,让我们看一下较低级别的 API。保存模型与 Keras API 类似:
```py
model = get_model() # get a fresh model
saved_model_path = "/tmp/tf_save"
tf.saved_model.save(model, saved_model_path)
```
```py
INFO:tensorflow:Assets written to: /tmp/tf_save/assets
INFO:tensorflow:Assets written to: /tmp/tf_save/assets
```
可以使用 [`tf.saved_model.load()`](https://tensorflow.google.cn/api_docs/python/tf/saved_model/load) 进行加载。但是,由于该 API 级别较低(因此用例范围更广泛),所以不会返回 Keras 模型。相反,它返回一个对象,其中包含可用于进行推断的函数。例如:
```py
DEFAULT_FUNCTION_KEY = "serving_default"
loaded = tf.saved_model.load(saved_model_path)
inference_func = loaded.signatures[DEFAULT_FUNCTION_KEY]
```
加载的对象可能包含多个函数,每个函数与一个键关联。`"serving_default"` 是使用已保存的 Keras 模型的推断函数的默认键。要使用此函数进行推断,请运行以下代码:
```py
predict_dataset = eval_dataset.map(lambda image, label: image)
for batch in predict_dataset.take(1):
print(inference_func(batch))
```
```py
{'dense_3': <tf.Tensor: shape=(64, 10), dtype=float32, numpy=
array([[ 0.17218862, 0.07492599, -0.0548683 , 0.03503785, -0.03743191,
-0.05301537, 0.01267872, -0.02870197, -0.33800656, 0.17991678],
[ 0.12937182, -0.21557797, -0.09474514, 0.39076763, -0.22147779,
-0.1787742 , 0.2154337 , 0.00788027, -0.14960325, 0.43123117],
[ 0.04755233, -0.20264567, -0.17308846, 0.19781005, -0.11123425,
-0.4295108 , 0.05442019, 0.01459119, -0.17129104, 0.04688327],
[ 0.09866484, 0.01627818, -0.08671301, 0.05742932, -0.20312837,
-0.38836166, -0.06952551, 0.05141062, -0.03084616, 0.05498504],
[ 0.00565811, -0.04239772, 0.04898138, 0.06162139, -0.16708252,
-0.12976539, -0.00474121, 0.05431085, -0.14715545, 0.07582194],
[ 0.17589626, 0.19629489, -0.2076093 , 0.02031662, -0.1619812 ,
-0.24300966, -0.0310282 , -0.00850905, -0.18514219, 0.23665032],
[-0.02653 , -0.17737214, -0.24494407, 0.20125583, -0.17153463,
-0.18641792, 0.11408111, 0.01489197, -0.099539 , 0.41159016],
[ 0.1903163 , 0.1697292 , -0.14116906, 0.1588785 , -0.04286646,
-0.19863203, -0.04836996, -0.00679918, -0.14634813, 0.14979276],
[ 0.12109621, 0.03313948, -0.1955429 , 0.23528968, -0.12369496,
-0.20725062, 0.06024174, 0.05078189, -0.158943 , 0.16846842],
[ 0.16227934, 0.06379895, -0.08847713, 0.08261362, -0.03925761,
-0.17770812, -0.043965 , 0.02072081, -0.07430968, 0.05749936],
[ 0.05508922, -0.14091367, -0.1887006 , 0.12903523, -0.13182093,
-0.11879301, 0.20175044, 0.11686974, -0.1616871 , 0.2226192 ],
[ 0.18285918, -0.01880376, -0.15778637, 0.04477023, -0.22364017,
-0.23864916, -0.06328501, 0.04380857, -0.04448643, 0.40406597],
[ 0.04721744, 0.06619421, -0.10837474, 0.1292499 , -0.17490903,
-0.17313394, -0.06603841, 0.15658481, -0.09657097, -0.04059617],
[-0.04412666, 0.02258963, 0.08539917, 0.2561011 , -0.18279126,
-0.2519745 , -0.00787598, 0.08598025, -0.21961546, 0.10189874],
[ 0.05089861, 0.06746367, -0.13205 , 0.09160744, -0.30171782,
-0.25160635, 0.08317091, 0.03015741, -0.10570806, 0.28686398],
[ 0.13625176, -0.109529 , 0.04985618, 0.08199271, -0.24280871,
-0.22908798, 0.17737128, 0.09937412, -0.31234092, 0.2290439 ],
[ 0.13812706, 0.10425253, 0.0128724 , 0.12191941, -0.09126505,
-0.13897963, -0.17568447, 0.16489705, -0.26533198, 0.06911667],
[ 0.16982701, 0.087276 , -0.17102191, 0.06745699, -0.06239565,
-0.17226742, -0.02450407, 0.10939141, -0.13510445, 0.04026298],
[-0.05762933, 0.03908077, 0.0729831 , 0.12001946, -0.12699135,
-0.37191632, -0.10294843, 0.1815257 , -0.10121268, 0.06880292],
[ 0.07649058, -0.03354908, -0.06362928, -0.00831218, -0.24217641,
-0.11137463, 0.01944396, 0.0310707 , 0.0093919 , 0.34353036],
[ 0.16107717, -0.04705916, -0.14095825, 0.05297582, -0.1485554 ,
-0.12321693, 0.07225874, 0.07695273, -0.17055047, 0.22460693],
[ 0.02565719, -0.05495968, -0.11961621, 0.03014402, -0.1645109 ,
-0.26333475, 0.07536604, 0.04426918, -0.12448484, 0.04142715],
[ 0.02295595, 0.01484419, -0.28111714, 0.05291839, -0.09908111,
-0.22002876, 0.00388122, 0.06801579, -0.03227042, 0.04201593],
[ 0.01293404, -0.15113808, -0.05814568, 0.29754263, -0.13849238,
-0.02268202, 0.16958144, 0.12881759, -0.13463333, 0.3364867 ],
[ 0.19805974, -0.01798259, -0.12835501, 0.26842418, -0.04154617,
-0.19442351, -0.08115683, 0.08586816, 0.00582654, 0.04328927],
[ 0.09159922, 0.12617984, -0.15028486, 0.23344447, -0.06932314,
-0.1483246 , -0.02017963, 0.03262286, -0.2800941 , 0.18364596],
[ 0.1528 , 0.13280275, -0.09938447, 0.03614349, -0.1096218 ,
-0.19335787, -0.04933339, -0.02397237, -0.13356304, -0.01165973],
[ 0.13618907, 0.14891617, -0.16118397, 0.10435603, -0.1831438 ,
-0.16405147, -0.14186187, 0.12581114, -0.15762964, 0.13493878],
[ 0.05534358, -0.0916103 , 0.0352111 , 0.0020496 , -0.19224274,
-0.17663556, 0.08702807, -0.08016825, -0.14833373, 0.10739949],
[ 0.02660379, -0.04472145, 0.01165188, 0.0219909 , -0.16059823,
-0.26817566, -0.09790543, 0.10905766, -0.01595427, 0.304615 ],
[ 0.08248052, -0.09962849, -0.02325149, 0.04280585, -0.20835052,
-0.2023199 , -0.0130603 , 0.07936736, 0.0494375 , 0.27143508],
[ 0.00310345, 0.04583906, -0.20415008, 0.1876276 , -0.06600557,
-0.19580218, -0.02222047, 0.07650423, -0.08899002, 0.10885157],
[ 0.0783096 , -0.01651647, -0.09479928, 0.07058451, -0.14990349,
-0.33366078, 0.0564964 , 0.01118498, -0.14589244, 0.22603557],
[ 0.04565446, 0.05590308, -0.02989801, -0.07578284, -0.09796432,
-0.20807403, -0.00954358, 0.02622838, -0.10276475, -0.05590656],
[ 0.07286316, 0.01376749, -0.18262148, 0.28560585, -0.18269306,
-0.06166455, 0.12229253, 0.11880912, -0.08595768, 0.17080015],
[ 0.12635507, -0.0836257 , 0.03501946, 0.30507207, -0.34584454,
-0.29186884, 0.26327768, 0.18378039, -0.09220086, 0.16707191],
[ 0.11742169, 0.02937749, -0.16469768, 0.31997636, -0.1280521 ,
-0.17700416, 0.05593231, 0.05017062, -0.31535 , 0.15465745],
[ 0.08975917, 0.01203279, 0.09783987, 0.06205256, -0.05648104,
-0.27429107, -0.12651348, 0.09195078, -0.2890005 , 0.08270936],
[ 0.09477694, 0.10097383, -0.05783979, 0.11597094, -0.05375554,
-0.04229444, -0.09689695, 0.08121311, -0.05716637, 0.09075539],
[-0.04117738, -0.06426363, -0.0629988 , 0.00692648, -0.30303234,
-0.28447956, -0.01935545, 0.159902 , -0.10399745, 0.17079492],
[-0.01080875, -0.04450692, -0.19694453, 0.15313052, -0.11790004,
-0.21164687, 0.16064486, 0.05443045, 0.04431828, 0.18498638],
[ 0.16398555, 0.21772492, -0.03592323, 0.15181649, -0.02455682,
-0.28267485, -0.12445807, 0.17047536, -0.19300474, -0.01467199],
[ 0.04904355, -0.0152067 , 0.09667489, -0.01841408, -0.08439851,
-0.2905228 , -0.0541675 , 0.07489735, -0.13492545, 0.1839124 ],
[ 0.2369909 , 0.08534706, -0.12017098, 0.04527019, -0.05781246,
-0.1196178 , -0.09442404, 0.01685349, -0.26979008, 0.17579612],
[ 0.04441281, -0.09139308, 0.00063404, 0.02085789, -0.17478338,
-0.1746104 , 0.21254838, 0.07575508, -0.19009903, 0.26038024],
[ 0.23913413, 0.13267268, -0.11951514, 0.13184579, -0.11442515,
-0.1563474 , -0.13503158, 0.1639925 , -0.11313978, 0.05294855],
[ 0.11768216, 0.12213368, -0.00641227, 0.1983034 , -0.10263431,
-0.10918278, -0.06888436, 0.26294842, -0.1041921 , 0.09731302],
[ 0.16183744, -0.14602011, -0.17195675, 0.1428874 , -0.26739907,
-0.3048862 , 0.06860068, 0.03065268, -0.13347332, 0.4117231 ],
[-0.02206257, 0.00734324, 0.003649 , 0.12295016, -0.22801307,
-0.23414296, -0.03367008, 0.11127277, -0.01726604, -0.0447302 ],
[ 0.10106434, 0.09055474, -0.12789255, 0.1377592 , -0.05564225,
-0.21510065, -0.09061419, -0.0219887 , -0.14411387, -0.03950592],
[ 0.12847602, -0.09453006, -0.04503661, 0.27597424, -0.17524761,
-0.05134012, 0.16526361, 0.08649909, -0.22461002, 0.45229536],
[ 0.04311011, 0.09949236, -0.04975891, 0.22421105, -0.12030718,
-0.09846736, -0.1408607 , 0.2384947 , -0.21582088, 0.01464934],
[-0.03788627, 0.04636163, 0.07747708, 0.0814044 , -0.12896554,
-0.31223392, -0.0578138 , 0.1859979 , -0.10911787, 0.15140374],
[ 0.08929176, -0.02551255, -0.06947158, 0.25500187, -0.18166143,
-0.1110489 , 0.0658811 , 0.23209906, -0.00346252, 0.27463445],
[ 0.12721871, -0.05336493, -0.01648436, 0.23337078, -0.22428553,
-0.17424905, 0.03487325, 0.28687072, 0.04055911, 0.30594033],
[ 0.18656036, -0.00513786, -0.16282284, 0.02530107, -0.17092519,
-0.24259233, 0.05227455, 0.19966123, -0.28181344, 0.14443643],
[ 0.02111852, -0.04639132, -0.01641255, 0.20416623, -0.11734181,
-0.08085347, 0.13685697, 0.10490854, -0.09023371, 0.32988763],
[ 0.06382357, 0.02803485, 0.03532831, 0.07898249, -0.10290041,
-0.2603921 , -0.03376516, 0.09166428, -0.14019875, 0.19503292],
[ 0.15105441, 0.0064583 , -0.1603775 , 0.16818096, -0.22179885,
-0.36698502, 0.12694073, -0.1294238 , -0.21702135, 0.34743598],
[ 0.11475793, -0.08016841, -0.19020993, 0.27748483, -0.13198294,
-0.22254312, 0.19926155, 0.19124901, -0.08933976, 0.25242418],
[ 0.09380357, -0.02989926, -0.01782445, 0.00312767, -0.02519768,
-0.43802148, -0.00290839, 0.04753356, -0.02965541, 0.10304467],
[ 0.20286047, -0.07675526, -0.03217752, 0.17366095, -0.13799758,
-0.27491322, 0.00279245, 0.14233288, -0.05951798, 0.36937428],
[ 0.01445094, -0.07265921, 0.10096341, 0.17594802, -0.17472097,
-0.2958681 , 0.0036519 , 0.03119059, -0.2027646 , -0.01793122],
[-0.02391969, -0.10441571, -0.00624696, 0.06563509, -0.14965585,
-0.3743796 , 0.0422266 , 0.04684277, 0.05023851, -0.07264638]],
dtype=float32)>}
```
您还可以采用分布式方式加载和进行推断:
```py
another_strategy = tf.distribute.MirroredStrategy()
with another_strategy.scope():
loaded = tf.saved_model.load(saved_model_path)
inference_func = loaded.signatures[DEFAULT_FUNCTION_KEY]
dist_predict_dataset = another_strategy.experimental_distribute_dataset(
predict_dataset)
# Calling the function in a distributed manner
for batch in dist_predict_dataset:
another_strategy.run(inference_func,args=(batch,))
```
```py
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
Warning:tensorflow:Using MirroredStrategy eagerly has significant overhead currently. We will be working on improving this in the future, but for now please wrap `call_for_each_replica` or `experimental_run` or `experimental_run_v2` inside a tf.function to get the best performance.
Warning:tensorflow:Using MirroredStrategy eagerly has significant overhead currently. We will be working on improving this in the future, but for now please wrap `call_for_each_replica` or `experimental_run` or `experimental_run_v2` inside a tf.function to get the best performance.
Warning:tensorflow:Using MirroredStrategy eagerly has significant overhead currently. We will be working on improving this in the future, but for now please wrap `call_for_each_replica` or `experimental_run` or `experimental_run_v2` inside a tf.function to get the best performance.
Warning:tensorflow:Using MirroredStrategy eagerly has significant overhead currently. We will be working on improving this in the future, but for now please wrap `call_for_each_replica` or `experimental_run` or `experimental_run_v2` inside a tf.function to get the best performance.
Warning:tensorflow:Using MirroredStrategy eagerly has significant overhead currently. We will be working on improving this in the future, but for now please wrap `call_for_each_replica` or `experimental_run` or `experimental_run_v2` inside a tf.function to get the best performance.
Warning:tensorflow:Using MirroredStrategy eagerly has significant overhead currently. We will be working on improving this in the future, but for now please wrap `call_for_each_replica` or `experimental_run` or `experimental_run_v2` inside a tf.function to get the best performance.
Warning:tensorflow:Using MirroredStrategy eagerly has significant overhead currently. We will be working on improving this in the future, but for now please wrap `call_for_each_replica` or `experimental_run` or `experimental_run_v2` inside a tf.function to get the best performance.
Warning:tensorflow:Using MirroredStrategy eagerly has significant overhead currently. We will be working on improving this in the future, but for now please wrap `call_for_each_replica` or `experimental_run` or `experimental_run_v2` inside a tf.function to get the best performance.
Warning:tensorflow:Using MirroredStrategy eagerly has significant overhead currently. We will be working on improving this in the future, but for now please wrap `call_for_each_replica` or `experimental_run` or `experimental_run_v2` inside a tf.function to get the best performance.
Warning:tensorflow:Using MirroredStrategy eagerly has significant overhead currently. We will be working on improving this in the future, but for now please wrap `call_for_each_replica` or `experimental_run` or `experimental_run_v2` inside a tf.function to get the best performance.
```
调用已恢复的函数只是基于已保存模型的前向传递(预测)。如果您想继续训练加载的函数,或者将加载的函数嵌入到更大的模型中,应如何操作? 通常的做法是将此加载对象包装到 Keras 层以实现此目的。幸运的是,[TF Hub](https://tensorflow.google.cn/hub) 为此提供了 [hub.KerasLayer](https://github.com/tensorflow/hub/blob/master/tensorflow_hub/keras_layer.py),如下所示:
```py
import tensorflow_hub as hub
def build_model(loaded):
x = tf.keras.layers.Input(shape=(28, 28, 1), name='input_x')
# Wrap what's loaded to a KerasLayer
keras_layer = hub.KerasLayer(loaded, trainable=True)(x)
model = tf.keras.Model(x, keras_layer)
return model
another_strategy = tf.distribute.MirroredStrategy()
with another_strategy.scope():
loaded = tf.saved_model.load(saved_model_path)
model = build_model(loaded)
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=tf.keras.optimizers.Adam(),
metrics=['accuracy'])
model.fit(train_dataset, epochs=2)
```
```py
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
Epoch 1/2
938/938 [==============================] - 3s 3ms/step - loss: 0.2059 - accuracy: 0.9393
Epoch 2/2
938/938 [==============================] - 3s 3ms/step - loss: 0.0681 - accuracy: 0.9799
```
如您所见,[`hub.KerasLayer`](https://tensorflow.google.cn/hub/api_docs/python/hub/KerasLayer) 可将从 [`tf.saved_model.load()`](https://tensorflow.google.cn/api_docs/python/tf/saved_model/load) 加载回的结果包装到可供构建其他模型的 Keras 层。这对于迁移学习非常实用。
### 我应使用哪种 API
对于保存,如果您使用的是 Keras 模型,那么始终建议使用 Keras 的 `model.save()` API。如果您所保存的不是 Keras 模型,那么您只能选择使用较低级的 API。
对于加载,使用哪种 API 取决于您要从加载的 API 中获得什么。如果您无法或不想获取 Keras 模型,请使用 [`tf.saved_model.load()`](https://tensorflow.google.cn/api_docs/python/tf/saved_model/load)。否则,请使用 [`tf.keras.models.load_model()`](https://tensorflow.google.cn/api_docs/python/tf/keras/models/load_model)。请注意,只有保存 Keras 模型后,才能恢复 Keras 模型。
可以混合使用 API。您可以使用 `model.save` 保存 Keras 模型,并使用低级 API [`tf.saved_model.load`](https://tensorflow.google.cn/api_docs/python/tf/saved_model/load) 加载非 Keras 模型。
```py
model = get_model()
# Saving the model using Keras's save() API
model.save(keras_model_path)
another_strategy = tf.distribute.MirroredStrategy()
# Loading the model using lower level API
with another_strategy.scope():
loaded = tf.saved_model.load(keras_model_path)
```
```py
INFO:tensorflow:Assets written to: /tmp/keras_save/assets
INFO:tensorflow:Assets written to: /tmp/keras_save/assets
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
```
### 警告
有一种特殊情况,您的 Keras 模型没有明确定义的输入。例如,可以创建没有任何输入形状的序贯模型 (`Sequential([Dense(3), ...]`)。子类化模型在初始化后也没有明确定义的输入。在这种情况下,在保存和加载时都应坚持使用较低级别的 API否则会出现错误。
要检查您的模型是否具有明确定义的输入,只需检查 `model.inputs` 是否为 `None`。如果非 `None`,则一切正常。在 `.fit``.evaluate``.predict` 中使用模型,或调用模型 (`model(inputs)`) 时,输入形状将自动定义。
以下为示例:
```py
class SubclassedModel(tf.keras.Model):
output_name = 'output_layer'
def __init__(self):
super(SubclassedModel, self).__init__()
self._dense_layer = tf.keras.layers.Dense(
5, dtype=tf.dtypes.float32, name=self.output_name)
def call(self, inputs):
return self._dense_layer(inputs)
my_model = SubclassedModel()
# my_model.save(keras_model_path) # ERROR!
tf.saved_model.save(my_model, saved_model_path)
```
```py
WARNING:tensorflow:Skipping full serialization of Keras layer <__main__.SubclassedModel object at 0x7f96b1c92320>, because it is not built.
Warning:tensorflow:Skipping full serialization of Keras layer <__main__.SubclassedModel object at 0x7f96b1c92320>, because it is not built.
Warning:tensorflow:Skipping full serialization of Keras layer <tensorflow.python.keras.layers.core.Dense object at 0x7f96b1c92b70>, because it is not built.
Warning:tensorflow:Skipping full serialization of Keras layer <tensorflow.python.keras.layers.core.Dense object at 0x7f96b1c92b70>, because it is not built.
INFO:tensorflow:Assets written to: /tmp/tf_save/assets
INFO:tensorflow:Assets written to: /tmp/tf_save/assets
```

File diff suppressed because it is too large Load Diff

View File

@@ -1 +0,0 @@
# 图像

View File

@@ -1,207 +0,0 @@
# 卷积神经网络Convolutional Neural Network, CNN
> 原文:[https://tensorflow.google.cn/tutorials/images/cnn](https://tensorflow.google.cn/tutorials/images/cnn)
**Note:** 我们的 TensorFlow 社区翻译了这些文档。因为社区翻译是尽力而为, 所以无法保证它们是最准确的,并且反映了最新的 [官方英文文档](https://tensorflow.google.cn/?hl=en)。如果您有改进此翻译的建议, 请提交 pull request 到 [tensorflow/docs-l10n](https://github.com/tensorflow/docs-l10n) GitHub 仓库。要志愿地撰写或者审核译文,请加入 [docs-zh-cn@tensorflow.org Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs-zh-cn)。
### 导入 TensorFlow
```py
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
```
### 下载并准备 CIFAR10 数据集
CIFAR10 数据集包含 10 类,共 60000 张彩色图片,每类图片有 6000 张。此数据集中 50000 个样例被作为训练集,剩余 10000 个样例作为测试集。类之间相互度立,不存在重叠的部分。
```py
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
# 将像素的值标准化至 0 到 1 的区间内。
train_images, test_images = train_images / 255.0, test_images / 255.0
```
### 验证数据
我们将测试集的前 25 张图片和类名打印出来,来确保数据集被正确加载。
```py
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck']
plt.figure(figsize=(10,10))
for i in range(25):
plt.subplot(5,5,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(train_images[i], cmap=plt.cm.binary)
# 由于 CIFAR 的标签是 array
# 因此您需要额外的索引index
plt.xlabel(class_names[train_labels[i][0]])
plt.show()
```
![png](img/25a15211c7a5c4ce6da843197b4b85eb.png)
### 构造卷积神经网络模型
下方展示的 6 行代码声明了了一个常见卷积神经网络,由几个 [Conv2D](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/Conv2D) 和 [MaxPooling2D](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/MaxPool2D) 层组成。
CNN 的输入是张量 (Tensor) 形式的 (image_height, image_width, color_channels),包含了图像高度、宽度及颜色信息。不需要输入 batch size。如果您不熟悉图像处理颜色信息建议您使用 RGB 色彩模式,此模式下,`color_channels``(R,G,B)` 分别对应 RGB 的三个颜色通道color channel。在此示例中我们的 CNN 输入CIFAR 数据集中的图片,形状是 `(32, 32, 3)`。您可以在声明第一层时将形状赋值给参数 `input_shape`
```py
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
```
我们声明的 CNN 结构是:
```py
model.summary()
```
```py
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 30, 30, 32) 896
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 13, 13, 64) 18496
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 64) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 4, 4, 64) 36928
=================================================================
Total params: 56,320
Trainable params: 56,320
Non-trainable params: 0
_________________________________________________________________
```
在上面的结构中,您可以看到每个 Conv2D 和 MaxPooling2D 层的输出都是一个三维的张量 (Tensor),其形状描述了 (height, width, channels)。越深的层中,宽度和高度都会收缩。每个 Conv2D 层输出的通道数量 (channels) 取决于声明层时的第一个参数(如:上面代码中的 32 或 64。这样由于宽度和高度的收缩您便可以从运算的角度增加每个 Conv2D 层输出的通道数量 (channels)。
### 增加 Dense 层
*Dense 层等同于全连接 (Full Connected) 层。*
在模型的最后,您将把卷积后的输出张量(本例中形状为 (4, 4, 64))传给一个或多个 Dense 层来完成分类。Dense 层的输入为向量(一维),但前面层的输出是 3 维的张量 (Tensor)。因此您需要将三维张量展开 (flatten) 到 1 维,之后再传入一个或多个 Dense 层。CIFAR 数据集有 10 个类,因此您最终的 Dense 层需要 10 个输出及一个 softmax 激活函数。
```py
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))
```
查看完整的 CNN 结构:
```py
model.summary()
```
```py
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 30, 30, 32) 896
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 13, 13, 64) 18496
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 64) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 4, 4, 64) 36928
_________________________________________________________________
flatten (Flatten) (None, 1024) 0
_________________________________________________________________
dense (Dense) (None, 64) 65600
_________________________________________________________________
dense_1 (Dense) (None, 10) 650
=================================================================
Total params: 122,570
Trainable params: 122,570
Non-trainable params: 0
_________________________________________________________________
```
可以看出,在被传入两个 Dense 层之前,形状为 (4, 4, 64) 的输出被展平成了形状为 (1024) 的向量。
### 编译并训练模型
```py
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=10,
validation_data=(test_images, test_labels))
```
```py
Epoch 1/10
1563/1563 [==============================] - 5s 3ms/step - loss: 1.5143 - accuracy: 0.4469 - val_loss: 1.2281 - val_accuracy: 0.5585
Epoch 2/10
1563/1563 [==============================] - 5s 3ms/step - loss: 1.1625 - accuracy: 0.5855 - val_loss: 1.2102 - val_accuracy: 0.5660
Epoch 3/10
1563/1563 [==============================] - 5s 3ms/step - loss: 1.0049 - accuracy: 0.6458 - val_loss: 0.9935 - val_accuracy: 0.6511
Epoch 4/10
1563/1563 [==============================] - 5s 3ms/step - loss: 0.9089 - accuracy: 0.6801 - val_loss: 0.9658 - val_accuracy: 0.6536
Epoch 5/10
1563/1563 [==============================] - 5s 3ms/step - loss: 0.8341 - accuracy: 0.7066 - val_loss: 0.9890 - val_accuracy: 0.6581
Epoch 6/10
1563/1563 [==============================] - 5s 3ms/step - loss: 0.7797 - accuracy: 0.7272 - val_loss: 0.8948 - val_accuracy: 0.6891
Epoch 7/10
1563/1563 [==============================] - 5s 3ms/step - loss: 0.7287 - accuracy: 0.7437 - val_loss: 0.9004 - val_accuracy: 0.6947
Epoch 8/10
1563/1563 [==============================] - 5s 3ms/step - loss: 0.6858 - accuracy: 0.7609 - val_loss: 0.8284 - val_accuracy: 0.7191
Epoch 9/10
1563/1563 [==============================] - 5s 3ms/step - loss: 0.6448 - accuracy: 0.7736 - val_loss: 0.8752 - val_accuracy: 0.7096
Epoch 10/10
1563/1563 [==============================] - 5s 3ms/step - loss: 0.6117 - accuracy: 0.7855 - val_loss: 0.8524 - val_accuracy: 0.7204
```
### 评估模型
```py
plt.plot(history.history['accuracy'], label='accuracy')
plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 1])
plt.legend(loc='lower right')
plt.show()
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
```
![png](img/9564eb108080dfcb0a0231e7db795b06.png)
```py
313/313 - 1s - loss: 0.8524 - accuracy: 0.7204
```
```py
print(test_acc)
```
```py
0.7203999757766724
```
我们搭建的简单的 CNN 模型在测试集上可以达到 70% 的准确率。对于只有几行的代码来说效果不错!对于另一种 CNN 结构可参考另一个使用的基于 Keras 子类 API 和 [`tf.GradientTape`](https://tensorflow.google.cn/api_docs/python/tf/GradientTape) 的样例 [here](https://tensorflow.google.cn/tutorials/quickstart/advanced)。

View File

@@ -1,603 +0,0 @@
# Image classification
> 原文:[https://tensorflow.google.cn/tutorials/images/classification](https://tensorflow.google.cn/tutorials/images/classification)
This tutorial shows how to classify images of flowers. It creates an image classifier using a [`keras.Sequential`](https://tensorflow.google.cn/api_docs/python/tf/keras/Sequential) model, and loads data using [`preprocessing.image_dataset_from_directory`](https://tensorflow.google.cn/api_docs/python/tf/keras/preprocessing/image_dataset_from_directory). You will gain practical experience with the following concepts:
* Efficiently loading a dataset off disk.
* Identifying overfitting and applying techniques to mitigate it, including data augmentation and Dropout.
This tutorial follows a basic machine learning workflow:
1. Examine and understand data
2. Build an input pipeline
3. Build the model
4. Train the model
5. Test the model
6. Improve the model and repeat the process
## Import TensorFlow and other libraries
```py
import matplotlib.pyplot as plt
import numpy as np
import os
import PIL
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
```
## Download and explore the dataset
This tutorial uses a dataset of about 3,700 photos of flowers. The dataset contains 5 sub-directories, one per class:
```py
flower_photo/
daisy/
dandelion/
roses/
sunflowers/
tulips/
```
```py
import pathlib
dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True)
data_dir = pathlib.Path(data_dir)
```
```py
Downloading data from https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz
228818944/228813984 [==============================] - 5s 0us/step
```
After downloading, you should now have a copy of the dataset available. There are 3,670 total images:
```py
image_count = len(list(data_dir.glob('*/*.jpg')))
print(image_count)
```
```py
3670
```
Here are some roses:
```py
roses = list(data_dir.glob('roses/*'))
PIL.Image.open(str(roses[0]))
```
![png](img/87abb24bd5c5230158bc1ff3b3bb5624.png)
```py
PIL.Image.open(str(roses[1]))
```
![png](img/c5f05439bb7e2eb354fda7f89beadeb3.png)
And some tulips:
```py
tulips = list(data_dir.glob('tulips/*'))
PIL.Image.open(str(tulips[0]))
```
![png](img/dcd2e24d351259809e8bd2dfe61f3f59.png)
```py
PIL.Image.open(str(tulips[1]))
```
![png](img/25794664318bbd0dc1284a9ea6754d14.png)
# Load using keras.preprocessing
Let's load these images off disk using the helpful [image_dataset_from_directory](https://tensorflow.google.cn/api_docs/python/tf/keras/preprocessing/image_dataset_from_directory) utility. This will take you from a directory of images on disk to a [`tf.data.Dataset`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset) in just a couple lines of code. If you like, you can also write your own data loading code from scratch by visiting the [load images](https://tensorflow.google.cn/tutorials/load_data/images) tutorial.
## Create a dataset
Define some parameters for the loader:
```py
batch_size = 32
img_height = 180
img_width = 180
```
It's good practice to use a validation split when developing your model. Let's use 80% of the images for training, and 20% for validation.
```py
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="training",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
```
```py
Found 3670 files belonging to 5 classes.
Using 2936 files for training.
```
```py
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="validation",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
```
```py
Found 3670 files belonging to 5 classes.
Using 734 files for validation.
```
You can find the class names in the `class_names` attribute on these datasets. These correspond to the directory names in alphabetical order.
```py
class_names = train_ds.class_names
print(class_names)
```
```py
['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips']
```
## Visualize the data
Here are the first 9 images from the training dataset.
```py
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 10))
for images, labels in train_ds.take(1):
for i in range(9):
ax = plt.subplot(3, 3, i + 1)
plt.imshow(images[i].numpy().astype("uint8"))
plt.title(class_names[labels[i]])
plt.axis("off")
```
![png](img/01e618f7715193d849381e8d78c78c09.png)
You will train a model using these datasets by passing them to `model.fit` in a moment. If you like, you can also manually iterate over the dataset and retrieve batches of images:
```py
for image_batch, labels_batch in train_ds:
print(image_batch.shape)
print(labels_batch.shape)
break
```
```py
(32, 180, 180, 3)
(32,)
```
The `image_batch` is a tensor of the shape `(32, 180, 180, 3)`. This is a batch of 32 images of shape `180x180x3` (the last dimension refers to color channels RGB). The `label_batch` is a tensor of the shape `(32,)`, these are corresponding labels to the 32 images.
You can call `.numpy()` on the `image_batch` and `labels_batch` tensors to convert them to a `numpy.ndarray`.
## Configure the dataset for performance
Let's make sure to use buffered prefetching so you can yield data from disk without having I/O become blocking. These are two important methods you should use when loading data.
[`Dataset.cache()`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#cache) keeps the images in memory after they're loaded off disk during the first epoch. This will ensure the dataset does not become a bottleneck while training your model. If your dataset is too large to fit into memory, you can also use this method to create a performant on-disk cache.
[`Dataset.prefetch()`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#prefetch) overlaps data preprocessing and model execution while training.
Interested readers can learn more about both methods, as well as how to cache data to disk in the [data performance guide](https://tensorflow.google.cn/guide/data_performance#prefetching).
```py
AUTOTUNE = tf.data.experimental.AUTOTUNE
train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
```
## Standardize the data
The RGB channel values are in the `[0, 255]` range. This is not ideal for a neural network; in general you should seek to make your input values small. Here, you will standardize values to be in the `[0, 1]` range by using a Rescaling layer.
```py
normalization_layer = layers.experimental.preprocessing.Rescaling(1./255)
```
**Note:** The Keras Preprocessing utilities and layers introduced in this section are currently experimental and may change.
There are two ways to use this layer. You can apply it to the dataset by calling map:
```py
normalized_ds = train_ds.map(lambda x, y: (normalization_layer(x), y))
image_batch, labels_batch = next(iter(normalized_ds))
first_image = image_batch[0]
# Notice the pixels values are now in `[0,1]`.
print(np.min(first_image), np.max(first_image))
```
```py
0.006427039 0.99052274
```
Or, you can include the layer inside your model definition, which can simplify deployment. Let's use the second approach here.
**Note:** you previously resized images using the `image_size` argument of `image_dataset_from_directory`. If you want to include the resizing logic in your model as well, you can use the [Resizing](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/experimental/preprocessing/Resizing) layer.
# Create the model
The model consists of three convolution blocks with a max pool layer in each of them. There's a fully connected layer with 128 units on top of it that is activated by a `relu` activation function. This model has not been tuned for high accuracy, the goal of this tutorial is to show a standard approach.
```py
num_classes = 5
model = Sequential([
layers.experimental.preprocessing.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes)
])
```
## Compile the model
For this tutorial, choose the [`optimizers.Adam`](https://tensorflow.google.cn/api_docs/python/tf/keras/optimizers/Adam) optimizer and [`losses.SparseCategoricalCrossentropy`](https://tensorflow.google.cn/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy) loss function. To view training and validation accuracy for each training epoch, pass the `metrics` argument.
```py
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
```
## Model summary
View all the layers of the network using the model's `summary` method:
```py
model.summary()
```
```py
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
rescaling_1 (Rescaling) (None, 180, 180, 3) 0
_________________________________________________________________
conv2d (Conv2D) (None, 180, 180, 16) 448
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 90, 90, 16) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 90, 90, 32) 4640
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 45, 45, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 45, 45, 64) 18496
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 22, 22, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 30976) 0
_________________________________________________________________
dense (Dense) (None, 128) 3965056
_________________________________________________________________
dense_1 (Dense) (None, 5) 645
=================================================================
Total params: 3,989,285
Trainable params: 3,989,285
Non-trainable params: 0
_________________________________________________________________
```
## Train the model
```py
epochs=10
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
```
```py
Epoch 1/10
92/92 [==============================] - 3s 27ms/step - loss: 1.3816 - accuracy: 0.4077 - val_loss: 1.0884 - val_accuracy: 0.5518
Epoch 2/10
92/92 [==============================] - 1s 10ms/step - loss: 1.0222 - accuracy: 0.6039 - val_loss: 0.9661 - val_accuracy: 0.5872
Epoch 3/10
92/92 [==============================] - 1s 10ms/step - loss: 0.8417 - accuracy: 0.6778 - val_loss: 0.8763 - val_accuracy: 0.6417
Epoch 4/10
92/92 [==============================] - 1s 10ms/step - loss: 0.6234 - accuracy: 0.7691 - val_loss: 0.8961 - val_accuracy: 0.6444
Epoch 5/10
92/92 [==============================] - 1s 10ms/step - loss: 0.4066 - accuracy: 0.8580 - val_loss: 0.9164 - val_accuracy: 0.6717
Epoch 6/10
92/92 [==============================] - 1s 10ms/step - loss: 0.2379 - accuracy: 0.9234 - val_loss: 1.1665 - val_accuracy: 0.6417
Epoch 7/10
92/92 [==============================] - 1s 10ms/step - loss: 0.1372 - accuracy: 0.9571 - val_loss: 1.3581 - val_accuracy: 0.6621
Epoch 8/10
92/92 [==============================] - 1s 10ms/step - loss: 0.0802 - accuracy: 0.9789 - val_loss: 1.5392 - val_accuracy: 0.6526
Epoch 9/10
92/92 [==============================] - 1s 10ms/step - loss: 0.0405 - accuracy: 0.9918 - val_loss: 1.7072 - val_accuracy: 0.6730
Epoch 10/10
92/92 [==============================] - 1s 10ms/step - loss: 0.0311 - accuracy: 0.9925 - val_loss: 1.7984 - val_accuracy: 0.6458
```
## Visualize training results
Create plots of loss and accuracy on the training and validation sets.
```py
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs_range = range(epochs)
plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
```
![png](img/14fce8d9f2fd98077c5bf9a8db1f25ec.png)
As you can see from the plots, training accuracy and validation accuracy are off by large margin and the model has achieved only around 60% accuracy on the validation set.
Let's look at what went wrong and try to increase the overall performance of the model.
## Overfitting
In the plots above, the training accuracy is increasing linearly over time, whereas validation accuracy stalls around 60% in the training process. Also, the difference in accuracy between training and validation accuracy is noticeable—a sign of [overfitting](https://tensorflow.google.cn/tutorials/keras/overfit_and_underfit).
When there are a small number of training examples, the model sometimes learns from noises or unwanted details from training examples—to an extent that it negatively impacts the performance of the model on new examples. This phenomenon is known as overfitting. It means that the model will have a difficult time generalizing on a new dataset.
There are multiple ways to fight overfitting in the training process. In this tutorial, you'll use *data augmentation* and add *Dropout* to your model.
## Data augmentation
Overfitting generally occurs when there are a small number of training examples. [Data augmentation](https://tensorflow.google.cn/tutorials/images/data_augmentation) takes the approach of generating additional training data from your existing examples by augmenting them using random transformations that yield believable-looking images. This helps expose the model to more aspects of the data and generalize better.
You will implement data augmentation using experimental [Keras Preprocessing Layers](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/experimental/preprocessing/?version=nightly). These can be included inside your model like other layers, and run on the GPU.
```py
data_augmentation = keras.Sequential(
[
layers.experimental.preprocessing.RandomFlip("horizontal",
input_shape=(img_height,
img_width,
3)),
layers.experimental.preprocessing.RandomRotation(0.1),
layers.experimental.preprocessing.RandomZoom(0.1),
]
)
```
Let's visualize what a few augmented examples look like by applying data augmentation to the same image several times:
```py
plt.figure(figsize=(10, 10))
for images, _ in train_ds.take(1):
for i in range(9):
augmented_images = data_augmentation(images)
ax = plt.subplot(3, 3, i + 1)
plt.imshow(augmented_images[0].numpy().astype("uint8"))
plt.axis("off")
```
![png](img/696df8a523ce550bf177c7051cef2c75.png)
You will use data augmentation to train a model in a moment.
## Dropout
Another technique to reduce overfitting is to introduce [Dropout](https://developers.google.cn/machine-learning/glossary#dropout_regularization) to the network, a form of *regularization*.
When you apply Dropout to a layer it randomly drops out (by setting the activation to zero) a number of output units from the layer during the training process. Dropout takes a fractional number as its input value, in the form such as 0.1, 0.2, 0.4, etc. This means dropping out 10%, 20% or 40% of the output units randomly from the applied layer.
Let's create a new neural network using [`layers.Dropout`](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/Dropout), then train it using augmented images.
```py
model = Sequential([
data_augmentation,
layers.experimental.preprocessing.Rescaling(1./255),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.2),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes)
])
```
## Compile and train the model
```py
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
```
```py
model.summary()
```
```py
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
sequential_1 (Sequential) (None, 180, 180, 3) 0
_________________________________________________________________
rescaling_2 (Rescaling) (None, 180, 180, 3) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 180, 180, 16) 448
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 90, 90, 16) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 90, 90, 32) 4640
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 45, 45, 32) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 45, 45, 64) 18496
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 22, 22, 64) 0
_________________________________________________________________
dropout (Dropout) (None, 22, 22, 64) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 30976) 0
_________________________________________________________________
dense_2 (Dense) (None, 128) 3965056
_________________________________________________________________
dense_3 (Dense) (None, 5) 645
=================================================================
Total params: 3,989,285
Trainable params: 3,989,285
Non-trainable params: 0
_________________________________________________________________
```
```py
epochs = 15
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
```
```py
Epoch 1/15
92/92 [==============================] - 1s 13ms/step - loss: 1.4326 - accuracy: 0.3760 - val_loss: 1.1774 - val_accuracy: 0.5123
Epoch 2/15
92/92 [==============================] - 1s 12ms/step - loss: 1.1058 - accuracy: 0.5525 - val_loss: 0.9981 - val_accuracy: 0.5967
Epoch 3/15
92/92 [==============================] - 1s 12ms/step - loss: 1.0014 - accuracy: 0.5937 - val_loss: 0.9525 - val_accuracy: 0.6185
Epoch 4/15
92/92 [==============================] - 1s 12ms/step - loss: 0.9205 - accuracy: 0.6383 - val_loss: 0.9474 - val_accuracy: 0.6376
Epoch 5/15
92/92 [==============================] - 1s 12ms/step - loss: 0.8813 - accuracy: 0.6594 - val_loss: 0.9383 - val_accuracy: 0.6417
Epoch 6/15
92/92 [==============================] - 1s 12ms/step - loss: 0.8366 - accuracy: 0.6734 - val_loss: 0.8468 - val_accuracy: 0.6512
Epoch 7/15
92/92 [==============================] - 1s 12ms/step - loss: 0.7955 - accuracy: 0.6979 - val_loss: 0.8837 - val_accuracy: 0.6717
Epoch 8/15
92/92 [==============================] - 1s 12ms/step - loss: 0.7485 - accuracy: 0.7163 - val_loss: 0.8417 - val_accuracy: 0.6730
Epoch 9/15
92/92 [==============================] - 1s 12ms/step - loss: 0.7276 - accuracy: 0.7282 - val_loss: 0.8505 - val_accuracy: 0.6826
Epoch 10/15
92/92 [==============================] - 1s 12ms/step - loss: 0.6981 - accuracy: 0.7374 - val_loss: 0.7679 - val_accuracy: 0.6948
Epoch 11/15
92/92 [==============================] - 1s 12ms/step - loss: 0.6755 - accuracy: 0.7446 - val_loss: 0.7863 - val_accuracy: 0.6948
Epoch 12/15
92/92 [==============================] - 1s 12ms/step - loss: 0.6375 - accuracy: 0.7585 - val_loss: 0.7911 - val_accuracy: 0.7044
Epoch 13/15
92/92 [==============================] - 1s 12ms/step - loss: 0.6095 - accuracy: 0.7790 - val_loss: 0.7403 - val_accuracy: 0.7139
Epoch 14/15
92/92 [==============================] - 1s 12ms/step - loss: 0.6116 - accuracy: 0.7681 - val_loss: 0.7794 - val_accuracy: 0.7153
Epoch 15/15
92/92 [==============================] - 1s 12ms/step - loss: 0.5818 - accuracy: 0.7762 - val_loss: 0.7729 - val_accuracy: 0.7044
```
## Visualize training results
After applying data augmentation and Dropout, there is less overfitting than before, and training and validation accuracy are closer aligned.
```py
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs_range = range(epochs)
plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
```
![png](img/2127fb93f97c5aaf91e991540bbe84ed.png)
## Predict on new data
Finally, let's use our model to classify an image that wasn't included in the training or validation sets.
**Note:** Data augmentation and Dropout layers are inactive at inference time.
```py
sunflower_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/592px-Red_sunflower.jpg"
sunflower_path = tf.keras.utils.get_file('Red_sunflower', origin=sunflower_url)
img = keras.preprocessing.image.load_img(
sunflower_path, target_size=(img_height, img_width)
)
img_array = keras.preprocessing.image.img_to_array(img)
img_array = tf.expand_dims(img_array, 0) # Create a batch
predictions = model.predict(img_array)
score = tf.nn.softmax(predictions[0])
print(
"This image most likely belongs to {} with a {:.2f} percent confidence."
.format(class_names[np.argmax(score)], 100 * np.max(score))
)
```
```py
Downloading data from https://storage.googleapis.com/download.tensorflow.org/example_images/592px-Red_sunflower.jpg
122880/117948 [===============================] - 0s 0us/step
This image most likely belongs to sunflowers with a 99.45 percent confidence.
```

File diff suppressed because it is too large Load Diff

View File

@@ -1,38 +0,0 @@
# Transfer learning with TensorFlow Hub
> 原文:[https://tensorflow.google.cn/tutorials/images/transfer_learning_with_hub](https://tensorflow.google.cn/tutorials/images/transfer_learning_with_hub)
[TensorFlow Hub](https://hub.tensorflow.google.cn/) is a repository of pre-trained TensorFlow models.
This tutorial demonstrates how to:
1. Use models from TensorFlow Hub with [`tf.keras`](https://tensorflow.google.cn/api_docs/python/tf/keras)
2. Use an image classification model from TensorFlow Hub
3. Do simple transfer learning to fine-tune a model for your own image classes
## Setup
```py
import numpy as np
import time
import PIL.Image as Image
import matplotlib.pylab as plt
import tensorflow as tf
import tensorflow_hub as hub
```
## An ImageNet classifier
You'll start by using a pretrained classifer model to take an image and predict what it's an image of - no training required!
### Download the classifier
Use [`hub.KerasLayer`](https://tensorflow.google.cn/hub/api_docs/python/hub/KerasLayer) to load a [MobileNetV2 model](https://hub.tensorflow.google.cn/google/tf2-preview/mobilenet_v2/classification/2) from TensorFlow Hub. Any [compatible image classifier model](https://hub.tensorflow.google.cn/s?q=tf2&module-type=image-classification) from hub.tensorflow.google.cn will work here.
<section class="expandable"><button type="button" class="button-red button expand-control">Toggle code</button></section>
```py
classifier_model ="https://hub.tensorflow.google.cn/google/tf2-preview/mobilenet_v2/classification/4"
```

View File

@@ -1,500 +0,0 @@
# Data augmentation
> 原文:[https://tensorflow.google.cn/tutorials/images/data_augmentation](https://tensorflow.google.cn/tutorials/images/data_augmentation)
## Overview
This tutorial demonstrates data augmentation: a technique to increase the diversity of your training set by applying random (but realistic) transformations such as image rotation. You will learn how to apply data augmentation in two ways. First, you will use [Keras Preprocessing Layers](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/experimental/preprocessing/). Next, you will use [`tf.image`](https://tensorflow.google.cn/api_docs/python/tf/image).
## Setup
```py
pip install -q tf-nightly
```
```py
WARNING: You are using pip version 20.2.2; however, version 20.2.3 is available.
You should consider upgrading via the '/tmpfs/src/tf_docs_env/bin/python -m pip install --upgrade pip' command.
```
```py
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras import layers
from tensorflow.keras.datasets import mnist
```
## Download a dataset
This tutorial uses the [tf_flowers](https://tensorflow.google.cn/datasets/catalog/tf_flowers) dataset. For convenience, download the dataset using [TensorFlow Datasets](https://tensorflow.google.cn/datasets). If you would like to learn about others ways of importing data, see the [load images](https://tensorflow.google.cn/tutorials/load_data/images) tutorial.
```py
(train_ds, val_ds, test_ds), metadata = tfds.load(
'tf_flowers',
split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'],
with_info=True,
as_supervised=True,
)
```
```py
Downloading and preparing dataset tf_flowers/3.0.1 (download: 218.21 MiB, generated: 221.83 MiB, total: 440.05 MiB) to /home/kbuilder/tensorflow_datasets/tf_flowers/3.0.1...
Warning:absl:Dataset tf_flowers is hosted on GCS. It will automatically be downloaded to your
local data directory. If you'd instead prefer to read directly from our public
GCS bucket (recommended if you're running on GCP), you can instead pass
`try_gcs=True` to `tfds.load` or set `data_dir=gs://tfds-data/datasets`.
Dataset tf_flowers downloaded and prepared to /home/kbuilder/tensorflow_datasets/tf_flowers/3.0.1\. Subsequent calls will reuse this data.
```
The flowers dataset has five classes.
```py
num_classes = metadata.features['label'].num_classes
print(num_classes)
```
```py
5
```
Let's retrieve an image from the dataset and use it to demonstrate data augmentation.
```py
get_label_name = metadata.features['label'].int2str
image, label = next(iter(train_ds))
_ = plt.imshow(image)
_ = plt.title(get_label_name(label))
```
![png](img/aa45f39cd51486760afc706f90cf0afa.png)
## Use Keras preprocessing layers
**Note:** The [Keras Preprocesing Layers](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/experimental/preprocessing) introduced in this section are currently experimental.
### Resizing and rescaling
You can use preprocessing layers to [resize](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/experimental/preprocessing/Resizing) your images to a consistent shape, and to [rescale](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/experimental/preprocessing/Rescaling) pixel values.
```py
IMG_SIZE = 180
resize_and_rescale = tf.keras.Sequential([
layers.experimental.preprocessing.Resizing(IMG_SIZE, IMG_SIZE),
layers.experimental.preprocessing.Rescaling(1./255)
])
```
**Note:** the rescaling layer above standardizes pixel values to `[0,1]`. If instead you wanted `[-1,1]`, you would write `Rescaling(1./127.5, offset=-1)`.
You can see the result of applying these layers to an image.
```py
result = resize_and_rescale(image)
_ = plt.imshow(result)
```
![png](img/35228c04a07ff13d63e7c28043db3950.png)
You can verify the pixels are in `[0-1]`.
```py
print("Min and max pixel values:", result.numpy().min(), result.numpy().max())
```
```py
Min and max pixel values: 0.0 1.0
```
### Data augmentation
You can use preprocessing layers for data augmentation as well.
Let's create a few preprocessing layers and apply them repeatedly to the same image.
```py
data_augmentation = tf.keras.Sequential([
layers.experimental.preprocessing.RandomFlip("horizontal_and_vertical"),
layers.experimental.preprocessing.RandomRotation(0.2),
])
```
```py
# Add the image to a batch
image = tf.expand_dims(image, 0)
```
```py
plt.figure(figsize=(10, 10))
for i in range(9):
augmented_image = data_augmentation(image)
ax = plt.subplot(3, 3, i + 1)
plt.imshow(augmented_image[0])
plt.axis("off")
```
![png](img/30586460013d859e496dd27ce6b18cbc.png)
There are a variety of preprocessing [layers](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/experimental/preprocessing) you can use for data augmentation including `layers.RandomContrast`, `layers.RandomCrop`, `layers.RandomZoom`, and others.
### Two options to use the preprocessing layers
There are two ways you can use these preprocessing layers, with important tradeoffs.
#### Option 1: Make the preprocessing layers part of your model
```py
model = tf.keras.Sequential([
resize_and_rescale,
data_augmentation,
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
# Rest of your model
])
```
There are two important points to be aware of in this case:
* Data augmentation will run on-device, synchronously with the rest of your layers, and benefit from GPU acceleration.
* When you export your model using `model.save`, the preprocessing layers will be saved along with the rest of your model. If you later deploy this model, it will automatically standardize images (according to the configuration of your layers). This can save you from the effort of having to reimplement that logic server-side.
**Note:** Data augmentation is inactive at test time so input images will only be augmented during calls to `model.fit` (not `model.evaluate` or `model.predict`).
#### Option 2: Apply the preprocessing layers to your dataset
```py
aug_ds = train_ds.map(
lambda x, y: (resize_and_rescale(x, training=True), y))
```
With this approach, you use [`Dataset.map`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#map) to create a dataset that yields batches of augmented images. In this case:
* Data augmentation will happen asynchronously on the CPU, and is non-blocking. You can overlap the training of your model on the GPU with data preprocessing, using [`Dataset.prefetch`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#prefetch), shown below.
* In this case the prepreprocessing layers will not be exported with the model when you call `model.save`. You will need to attach them to your model before saving it or reimplement them server-side. After training, you can attach the preprocessing layers before export.
You can find an example of the first option in the [image classification](https://tensorflow.google.cn/tutorials/images/classification) tutorial. Let's demonstrate the second option here.
### Apply the preprocessing layers to the datasets
Configure the train, validation, and test datasets with the preprocessing layers you created above. You will also configure the datasets for performance, using parallel reads and buffered prefetching to yield batches from disk without I/O become blocking. You can learn more dataset performance in the [Better performance with the tf.data API](https://tensorflow.google.cn/guide/data_performance) guide.
**Note:** data augmentation should only be applied to the training set.
```py
batch_size = 32
AUTOTUNE = tf.data.experimental.AUTOTUNE
def prepare(ds, shuffle=False, augment=False):
# Resize and rescale all datasets
ds = ds.map(lambda x, y: (resize_and_rescale(x), y),
num_parallel_calls=AUTOTUNE)
if shuffle:
ds = ds.shuffle(1000)
# Batch all datasets
ds = ds.batch(batch_size)
# Use data augmentation only on the training set
if augment:
ds = ds.map(lambda x, y: (data_augmentation(x, training=True), y),
num_parallel_calls=AUTOTUNE)
# Use buffered prefecting on all datasets
return ds.prefetch(buffer_size=AUTOTUNE)
```
```py
train_ds = prepare(train_ds, shuffle=True, augment=True)
val_ds = prepare(val_ds)
test_ds = prepare(test_ds)
```
### Train a model
For completeness, you will now train a model using these datasets. This model has not been tuned for accuracy (the goal is to show you the mechanics).
```py
model = tf.keras.Sequential([
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes)
])
```
```py
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
```
```py
epochs=5
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
```
```py
Epoch 1/5
92/92 [==============================] - 30s 315ms/step - loss: 1.5078 - accuracy: 0.3428 - val_loss: 1.0809 - val_accuracy: 0.6240
Epoch 2/5
92/92 [==============================] - 28s 303ms/step - loss: 1.0781 - accuracy: 0.5724 - val_loss: 0.9762 - val_accuracy: 0.6322
Epoch 3/5
92/92 [==============================] - 28s 295ms/step - loss: 1.0083 - accuracy: 0.5900 - val_loss: 0.9570 - val_accuracy: 0.6376
Epoch 4/5
92/92 [==============================] - 28s 300ms/step - loss: 0.9537 - accuracy: 0.6116 - val_loss: 0.9081 - val_accuracy: 0.6485
Epoch 5/5
92/92 [==============================] - 28s 301ms/step - loss: 0.8816 - accuracy: 0.6525 - val_loss: 0.8353 - val_accuracy: 0.6594
```
```py
loss, acc = model.evaluate(test_ds)
print("Accuracy", acc)
```
```py
12/12 [==============================] - 1s 83ms/step - loss: 0.8226 - accuracy: 0.6567
Accuracy 0.6566757559776306
```
### Custom data augmentation
You can also create custom data augmenation layers. This tutorial shows two ways of doing so. First, you will create a [`layers.Lambda`](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/Lambda) layer. This is a good way to write concise code. Next, you will write a new layer via [subclassing](https://tensorflow.google.cn/guide/keras/custom_layers_and_models), which gives you more control. Both layers will randomly invert the colors in an image, accoring to some probability.
```py
def random_invert_img(x, p=0.5):
if tf.random.uniform([]) < p:
x = (255-x)
else:
x
return x
```
```py
def random_invert(factor=0.5):
return layers.Lambda(lambda x: random_invert_img(x, factor))
random_invert = random_invert()
```
```py
plt.figure(figsize=(10, 10))
for i in range(9):
augmented_image = random_invert(image)
ax = plt.subplot(3, 3, i + 1)
plt.imshow(augmented_image[0].numpy().astype("uint8"))
plt.axis("off")
```
![png](img/5c6f6f5e851c052e9e53969cd0419cbb.png)
Next, implement a custom layer by [subclassing](https://tensorflow.google.cn/guide/keras/custom_layers_and_models).
```py
class RandomInvert(layers.Layer):
def __init__(self, factor=0.5, **kwargs):
super().__init__(**kwargs)
self.factor = factor
def call(self, x):
return random_invert_img(x)
```
```py
_ = plt.imshow(RandomInvert()(image)[0])
```
![png](img/8142c6b01c1a35d86e4ace60827bcce8.png)
Both of these layers can be used as described in options 1 and 2 above.
## Using tf.image
The above `layers.preprocessing` utilities are convenient. For finer control, you can write your own data augmentation pipelines or layers using [`tf.data`](https://tensorflow.google.cn/api_docs/python/tf/data) and [`tf.image`](https://tensorflow.google.cn/api_docs/python/tf/image). You may also want to check out [TensorFlow Addons Image: Operations](https://tensorflow.google.cn/addons/tutorials/image_ops) and [TensorFlow I/O: Color Space Conversions](https://tensorflow.google.cn/io/tutorials/colorspace)
Since the flowers dataset was previously configured with data augmentation, let's reimport it to start fresh.
```py
(train_ds, val_ds, test_ds), metadata = tfds.load(
'tf_flowers',
split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'],
with_info=True,
as_supervised=True,
)
```
Retrieve an image to work with.
```py
image, label = next(iter(train_ds))
_ = plt.imshow(image)
_ = plt.title(get_label_name(label))
```
![png](img/cfa82b128c103151f142dae7b5ddecda.png)
Let's use the following function to visualize and compare the original and augmented images side-by-side.
```py
def visualize(original, augmented):
fig = plt.figure()
plt.subplot(1,2,1)
plt.title('Original image')
plt.imshow(original)
plt.subplot(1,2,2)
plt.title('Augmented image')
plt.imshow(augmented)
```
### Data augmentation
### Flipping the image
Flip the image either vertically or horizontally.
```py
flipped = tf.image.flip_left_right(image)
visualize(image, flipped)
```
![png](img/dda6acab76c9a017bbe16c3bebb8e54c.png)
### Grayscale the image
Grayscale an image.
```py
grayscaled = tf.image.rgb_to_grayscale(image)
visualize(image, tf.squeeze(grayscaled))
_ = plt.colorbar()
```
![png](img/1d2f7cb104afa8ee05f37076045f9195.png)
### Saturate the image
Saturate an image by providing a saturation factor.
```py
saturated = tf.image.adjust_saturation(image, 3)
visualize(image, saturated)
```
![png](img/7ef992617c160736f94c086cc0a754d5.png)
### Change image brightness
Change the brightness of image by providing a brightness factor.
```py
bright = tf.image.adjust_brightness(image, 0.4)
visualize(image, bright)
```
![png](img/e46db7cde2b53be53d302c4b00d582a5.png)
### Center crop the image
Crop the image from center up to the image part you desire.
```py
cropped = tf.image.central_crop(image, central_fraction=0.5)
visualize(image,cropped)
```
![png](img/fe72873df8e5156872c578827579ba34.png)
### Rotate the image
Rotate an image by 90 degrees.
```py
rotated = tf.image.rot90(image)
visualize(image, rotated)
```
![png](img/f769d692ddcca3810cad6e32307d9b3a.png)
### Apply augmentation to a dataset
As before, apply data augmentation to a dataset using [`Dataset.map`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#map).
```py
def resize_and_rescale(image, label):
image = tf.cast(image, tf.float32)
image = tf.image.resize(image, [IMG_SIZE, IMG_SIZE])
image = (image / 255.0)
return image, label
```
```py
def augment(image,label):
image, label = resize_and_rescale(image, label)
# Add 6 pixels of padding
image = tf.image.resize_with_crop_or_pad(image, IMG_SIZE + 6, IMG_SIZE + 6)
# Random crop back to the original size
image = tf.image.random_crop(image, size=[IMG_SIZE, IMG_SIZE, 3])
image = tf.image.random_brightness(image, max_delta=0.5) # Random brightness
image = tf.clip_by_value(image, 0, 1)
return image, label
```
### Configure the datasets
```py
train_ds = (
train_ds
.shuffle(1000)
.map(augment, num_parallel_calls=AUTOTUNE)
.batch(batch_size)
.prefetch(AUTOTUNE)
)
```
```py
val_ds = (
val_ds
.map(resize_and_rescale, num_parallel_calls=AUTOTUNE)
.batch(batch_size)
.prefetch(AUTOTUNE)
)
```
```py
test_ds = (
test_ds
.map(resize_and_rescale, num_parallel_calls=AUTOTUNE)
.batch(batch_size)
.prefetch(AUTOTUNE)
)
```
These datasets can now be used to train a model as shown previously.
## Next steps
This tutorial demonstrated data augmentation using [Keras Preprocessing Layers](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/experimental/preprocessing/) and [`tf.image`](https://tensorflow.google.cn/api_docs/python/tf/image). To learn how to include preprocessing layers inside your model, see the [Image classification](https://tensorflow.google.cn/tutorials/images/classification) tutorial. You may also be interested in learning how preprocessing layers can help you classify text, as shown in the [Basic text classification](https://tensorflow.google.cn/tutorials/keras/text_classification) tutorial. You can learn more about [`tf.data`](https://tensorflow.google.cn/api_docs/python/tf/data) in this [guide](https://tensorflow.google.cn/guide/data), and you can learn how to configure your input pipelines for performance [here](https://tensorflow.google.cn/guide/data_performance).

View File

@@ -1,326 +0,0 @@
# 图像分割
> 原文:[https://tensorflow.google.cn/tutorials/images/segmentation](https://tensorflow.google.cn/tutorials/images/segmentation)
**Note:** 我们的 TensorFlow 社区翻译了这些文档。因为社区翻译是尽力而为, 所以无法保证它们是最准确的,并且反映了最新的 [官方英文文档](https://tensorflow.google.cn/?hl=en)。如果您有改进此翻译的建议, 请提交 pull request 到 [tensorflow/docs](https://github.com/tensorflow/docs) GitHub 仓库。要志愿地撰写或者审核译文,请加入 [docs-zh-cn@tensorflow.org Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs-zh-cn)。
这篇教程将重点讨论图像分割任务,使用的是改进版的 [U-Net](https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/)。
## 什么是图像分割?
目前你已经了解在图像分类中,神经网络的任务是给每张输入图像分配一个标签或者类别。但是,有时你想知道一个物体在一张图像中的位置、这个物体的形状、以及哪个像素属于哪个物体等等。这种情况下你会希望分割图像,也就是给图像中的每个像素各分配一个标签。因此,图像分割的任务是训练一个神经网络来输出该图像对每一个像素的掩码。这对从更底层,即像素层级,来理解图像很有帮助。图像分割在例如医疗图像、自动驾驶车辆以及卫星图像等领域有很多应用。
本教程将使用的数据集是 [Oxford-IIIT Pet 数据集](https://www.robots.ox.ac.uk/%7Evgg/data/pets/),由 Parkhi *et al.* 创建。该数据集由图像、图像所对应的标签、以及对像素逐一标记的掩码组成。掩码其实就是给每个像素的标签。每个像素分别属于以下三个类别中的一个:
* 类别 1像素是宠物的一部分。
* 类别 2像素是宠物的轮廓。
* 类别 3以上都不是/外围像素。
```py
pip install -q git+https://github.com/tensorflow/examples.git
```
```py
WARNING: You are using pip version 20.2.2; however, version 20.2.3 is available.
You should consider upgrading via the '/tmpfs/src/tf_docs_env/bin/python -m pip install --upgrade pip' command.
```
```py
import tensorflow as tf
```
```py
from tensorflow_examples.models.pix2pix import pix2pix
import tensorflow_datasets as tfds
tfds.disable_progress_bar()
from IPython.display import clear_output
import matplotlib.pyplot as plt
```
## 下载 Oxford-IIIT Pets 数据集
这个数据集已经集成在 Tensorflow datasets 中,只需下载即可。图像分割掩码在版本 3.0.0 中才被加入,因此我们特别选用这个版本。
```py
dataset, info = tfds.load('oxford_iiit_pet:3.*.*', with_info=True)
```
```py
Downloading and preparing dataset oxford_iiit_pet/3.2.0 (download: 773.52 MiB, generated: 774.69 MiB, total: 1.51 GiB) to /home/kbuilder/tensorflow_datasets/oxford_iiit_pet/3.2.0...
Shuffling and writing examples to /home/kbuilder/tensorflow_datasets/oxford_iiit_pet/3.2.0.incompleteXSR11A/oxford_iiit_pet-train.tfrecord
Shuffling and writing examples to /home/kbuilder/tensorflow_datasets/oxford_iiit_pet/3.2.0.incompleteXSR11A/oxford_iiit_pet-test.tfrecord
Dataset oxford_iiit_pet downloaded and prepared to /home/kbuilder/tensorflow_datasets/oxford_iiit_pet/3.2.0\. Subsequent calls will reuse this data.
```
下面的代码进行了一个简单的图像翻转扩充。然后,将图像标准化到 [0,1]。最后,如上文提到的,像素点在图像分割掩码中被标记为 {1, 2, 3} 中的一个。为了方便起见,我们将分割掩码都减 1得到了以下的标签{0, 1, 2}。
```py
def normalize(input_image, input_mask):
input_image = tf.cast(input_image, tf.float32) / 255.0
input_mask -= 1
return input_image, input_mask
```
```py
@tf.function
def load_image_train(datapoint):
input_image = tf.image.resize(datapoint['image'], (128, 128))
input_mask = tf.image.resize(datapoint['segmentation_mask'], (128, 128))
if tf.random.uniform(()) > 0.5:
input_image = tf.image.flip_left_right(input_image)
input_mask = tf.image.flip_left_right(input_mask)
input_image, input_mask = normalize(input_image, input_mask)
return input_image, input_mask
```
```py
def load_image_test(datapoint):
input_image = tf.image.resize(datapoint['image'], (128, 128))
input_mask = tf.image.resize(datapoint['segmentation_mask'], (128, 128))
input_image, input_mask = normalize(input_image, input_mask)
return input_image, input_mask
```
数据集已经包含了所需的测试集和训练集划分,所以我们也延续使用相同的划分。
```py
TRAIN_LENGTH = info.splits['train'].num_examples
BATCH_SIZE = 64
BUFFER_SIZE = 1000
STEPS_PER_EPOCH = TRAIN_LENGTH // BATCH_SIZE
```
```py
train = dataset['train'].map(load_image_train, num_parallel_calls=tf.data.experimental.AUTOTUNE)
test = dataset['test'].map(load_image_test)
```
```py
train_dataset = train.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()
train_dataset = train_dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
test_dataset = test.batch(BATCH_SIZE)
```
我们来看一下数据集中的一例图像以及它所对应的掩码。
```py
def display(display_list):
plt.figure(figsize=(15, 15))
title = ['Input Image', 'True Mask', 'Predicted Mask']
for i in range(len(display_list)):
plt.subplot(1, len(display_list), i+1)
plt.title(title[i])
plt.imshow(tf.keras.preprocessing.image.array_to_img(display_list[i]))
plt.axis('off')
plt.show()
```
```py
for image, mask in train.take(1):
sample_image, sample_mask = image, mask
display([sample_image, sample_mask])
```
![png](img/a8a6734d5e53ebf66610af0af887bc96.png)
## 定义模型
这里用到的模型是一个改版的 U-Net。U-Net 由一个编码器下采样器downsampler和一个解码器上采样器upsampler组成。为了学习到鲁棒的特征同时减少可训练参数的数量这里可以使用一个预训练模型作为编码器。因此这项任务中的编码器将使用一个预训练的 MobileNetV2 模型,它的中间输出值将被使用。解码器将使用在 TensorFlow Examples 中的 [Pix2pix tutorial](https://github.com/tensorflow/examples/blob/master/tensorflow_examples/models/pix2pix/pix2pix.py) 里实施过的升频取样模块。
输出信道数量为 3 是因为每个像素有三种可能的标签。把这想象成一个多类别分类,每个像素都将被分到三个类别当中。
```py
OUTPUT_CHANNELS = 3
```
如之前提到的,编码器是一个预训练的 MobileNetV2 模型,它在 [tf.keras.applications](https://tensorflow.google.cn/versions/r2.0/api_docs/python/tf/keras/applications) 中已被准备好并可以直接使用。编码器中包含模型中间层的一些特定输出。注意编码器在模型的训练过程中是不会被训练的。
```py
base_model = tf.keras.applications.MobileNetV2(input_shape=[128, 128, 3], include_top=False)
# 使用这些层的激活设置
layer_names = [
'block_1_expand_relu', # 64x64
'block_3_expand_relu', # 32x32
'block_6_expand_relu', # 16x16
'block_13_expand_relu', # 8x8
'block_16_project', # 4x4
]
layers = [base_model.get_layer(name).output for name in layer_names]
# 创建特征提取模型
down_stack = tf.keras.Model(inputs=base_model.input, outputs=layers)
down_stack.trainable = False
```
```py
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_128_no_top.h5
9412608/9406464 [==============================] - 0s 0us/step
```
解码器/升频取样器是简单的一系列升频取样模块,在 TensorFlow examples 中曾被实施过。
```py
up_stack = [
pix2pix.upsample(512, 3), # 4x4 -> 8x8
pix2pix.upsample(256, 3), # 8x8 -> 16x16
pix2pix.upsample(128, 3), # 16x16 -> 32x32
pix2pix.upsample(64, 3), # 32x32 -> 64x64
]
```
```py
def unet_model(output_channels):
inputs = tf.keras.layers.Input(shape=[128, 128, 3])
x = inputs
# 在模型中降频取样
skips = down_stack(x)
x = skips[-1]
skips = reversed(skips[:-1])
# 升频取样然后建立跳跃连接
for up, skip in zip(up_stack, skips):
x = up(x)
concat = tf.keras.layers.Concatenate()
x = concat([x, skip])
# 这是模型的最后一层
last = tf.keras.layers.Conv2DTranspose(
output_channels, 3, strides=2,
padding='same') #64x64 -> 128x128
x = last(x)
return tf.keras.Model(inputs=inputs, outputs=x)
```
## 训练模型
现在,要做的只剩下编译和训练模型了。这里用到的损失函数是 losses.sparse_categorical_crossentropy。使用这个损失函数是因为神经网络试图给每一个像素分配一个标签和多类别预测是一样的。在正确的分割掩码中每个像素点的值是 {0,1,2} 中的一个。同时神经网络也输出三个信道。本质上,每个信道都在尝试学习预测一个类别,而 losses.sparse_categorical_crossentropy 正是这一情形下推荐使用的损失函数。根据神经网络的输出值,分配给每个像素的标签为输出值最高的信道所表示的那一类。这就是 create_mask 函数所做的工作。
```py
model = unet_model(OUTPUT_CHANNELS)
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
```
快速浏览一下最终的模型架构:
```py
tf.keras.utils.plot_model(model, show_shapes=True)
```
![png](img/fc1492a9c4124dcf0d9fb207c0a323d0.png)
我们试着运行一下模型,看看它在训练之前给出的预测值。
```py
def create_mask(pred_mask):
pred_mask = tf.argmax(pred_mask, axis=-1)
pred_mask = pred_mask[..., tf.newaxis]
return pred_mask[0]
```
```py
def show_predictions(dataset=None, num=1):
if dataset:
for image, mask in dataset.take(num):
pred_mask = model.predict(image)
display([image[0], mask[0], create_mask(pred_mask)])
else:
display([sample_image, sample_mask,
create_mask(model.predict(sample_image[tf.newaxis, ...]))])
```
```py
show_predictions()
```
![png](img/79de81de8fa8f26b206d9f7e2e29232f.png)
我们来观察模型是怎样随着训练而改善的。为达成这一目的,下面将定义一个 callback 函数。
```py
class DisplayCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
clear_output(wait=True)
show_predictions()
print ('\nSample Prediction after epoch {}\n'.format(epoch+1))
```
```py
EPOCHS = 20
VAL_SUBSPLITS = 5
VALIDATION_STEPS = info.splits['test'].num_examples//BATCH_SIZE//VAL_SUBSPLITS
model_history = model.fit(train_dataset, epochs=EPOCHS,
steps_per_epoch=STEPS_PER_EPOCH,
validation_steps=VALIDATION_STEPS,
validation_data=test_dataset,
callbacks=[DisplayCallback()])
```
![png](img/dd1b792428257ee1ffcb4e02d4e81c11.png)
```py
Sample Prediction after epoch 20
57/57 [==============================] - 3s 54ms/step - loss: 0.1308 - accuracy: 0.9401 - val_loss: 0.3246 - val_accuracy: 0.8903
```
```py
loss = model_history.history['loss']
val_loss = model_history.history['val_loss']
epochs = range(EPOCHS)
plt.figure()
plt.plot(epochs, loss, 'r', label='Training loss')
plt.plot(epochs, val_loss, 'bo', label='Validation loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss Value')
plt.ylim([0, 1])
plt.legend()
plt.show()
```
![png](img/12bbad2792cbf9031cf0f5c0e54b36a3.png)
## 做出预测
我们来做几个预测。为了节省时间这里只使用很少的周期epoch但是你可以设置更多的数量以获得更准确的结果。
```py
show_predictions(test_dataset, 3)
```
![png](img/a3923a442896cffee97920f98141a84c.png)
![png](img/8fcdc694ecba49a443b3d3fa3db737c8.png)
![png](img/58c58ebd47eeea7849c83cacae4000e9.png)
## 接下来
现在你已经对图像分割是什么以及它的工作原理有所了解。你可以在本教程里尝试使用不同的中间层输出值,或者甚至使用不同的预训练模型。你也可以去 Kaggle 举办的 [Carvana](https://www.kaggle.com/c/carvana-image-masking-challenge/overview) 图像分割挑战赛上挑战自己。
你也可以看看 [Tensorflow Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection) 上面其他的你可以使用自己数据进行再训练的模型。

View File

@@ -1 +0,0 @@
# 文本

View File

@@ -1,367 +0,0 @@
# 单词嵌入向量
> 原文:[https://tensorflow.google.cn/tutorials/text/word_embeddings](https://tensorflow.google.cn/tutorials/text/word_embeddings)
本教程将介绍单词嵌入向量。包含完整的代码,可在小型数据集上从头开始训练单词嵌入向量,并使用 [Embedding Projector](http://projector.tensorflow.org)(如下图所示)可视化这些嵌入向量。
![Screenshot of the embedding projector](img/16ea92d12fa8170f3e79e4c56f9affd1.png)
## 用数字表示文本
机器学习模型将向量(数字数组)作为输入。在处理文本时,我们必须先想出一种策略,将字符串转换为数字(或将文本“向量化”),然后再其馈入模型。在本部分中,我们将探究实现这一目标的三种策略。
### 独热编码
作为第一个想法我们可以对词汇表中的每个单词进行“独热”编码。考虑这样一句话“The cat sat on the mat”。这句话中的词汇或唯一单词cat、mat、on、sat、the。为了表示每个单词我们将创建一个长度等于词汇量的零向量然后在与该单词对应的索引中放置一个 1。下图显示了这种方法。
![Diagram of one-hot encodings](img/717d3c9c631162f5b991acff83eda7bc.png)
为了创建一个包含句子编码的向量,我们可以将每个单词的独热向量连接起来。
要点:这种方法效率低下。一个独热编码向量十分稀疏(这意味着大多数索引为零)。假设我们的词汇表中有 10,000 个单词。为了对每个单词进行独热编码,我们将创建一个其中 99.99% 的元素都为零的向量。
### 用一个唯一的数字编码每个单词
我们可以尝试的第二种方法是使用唯一的数字来编码每个单词。继续上面的示例,我们可以将 1 分配给“cat”将 2 分配给“mat”依此类推。然后我们可以将句子“The cat sat on the mat”编码为一个密集向量例如 [5, 1, 4, 3, 5, 2]。这种方法是高效的。现在,我们有了一个密集向量(所有元素均已满),而不是稀疏向量。
但是,这种方法有两个缺点:
* 整数编码是任意的(它不会捕获单词之间的任何关系)。
* 对于要解释的模型而言,整数编码颇具挑战。例如,线性分类器针对每个特征学习一个权重。由于任何两个单词的相似性与其编码的相似性之间都没有关系,因此这种特征权重组合没有意义。
### 单词嵌入向量
单词嵌入向量为我们提供了一种使用高效、密集表示的方法其中相似的单词具有相似的编码。重要的是我们不必手动指定此编码。嵌入向量是浮点值的密集向量向量的长度是您指定的参数。它们是可以训练的参数模型在训练过程中学习的权重与模型学习密集层权重的方法相同无需手动为嵌入向量指定值。8 维的单词嵌入向量(对于小型数据集)比较常见,而在处理大型数据集时最多可达 1024 维。维度更高的嵌入向量可以捕获单词之间的细粒度关系,但需要更多的数据来学习。
![Diagram of an embedding](img/4341c4ebffdd0a35a50322abd93518de.png)
上面是一个单词嵌入向量的示意图。每个单词都表示为浮点值的 4 维向量。还可以将嵌入向量视为“查找表”。学习完这些权重后,我们可以通过在表中查找对应的密集向量来编码每个单词。
## 设置
```py
import tensorflow as tf
```
```py
from tensorflow import keras
from tensorflow.keras import layers
import tensorflow_datasets as tfds
tfds.disable_progress_bar()
```
## 使用嵌入向量层
Keras 让使用单词嵌入向量变得轻而易举。我们来看一下[嵌入向量](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/Embedding)层。
可以将嵌入向量层理解为一个从整数索引(代表特定单词)映射到密集向量(其嵌入向量)的查找表。嵌入向量的维数(或宽度)是一个参数,您可以试验它的数值,以了解多少维度适合您的问题,这与您试验密集层中神经元数量的方式非常相似。
```py
embedding_layer = layers.Embedding(1000, 5)
```
创建嵌入向量层时,嵌入向量的权重会随机初始化(就像其他任何层一样)。在训练过程中,通过反向传播来逐渐调整这些权重。训练后,学习到的单词嵌入向量将粗略地编码单词之间的相似性(因为它们是针对训练模型的特定问题而学习的)。
如果将整数传递给嵌入向量层,结果会将每个整数替换为嵌入向量表中的向量:
```py
result = embedding_layer(tf.constant([1,2,3]))
result.numpy()
```
```py
array([[ 0.02629578, 0.0097797 , -0.04365711, 0.03760537, 0.0260709 ],
[ 0.03876719, 0.01541508, -0.0483237 , 0.03976549, 0.04153169],
[ 0.03035608, 0.0410546 , -0.03654389, -0.01073235, 0.02143143]],
dtype=float32)
```
对于文本或序列问题,嵌入向量层采用整数组成的 2D 张量,其形状为 `(samples, sequence_length)`,其中每个条目都是一个整数序列。它可以嵌入可变长度的序列。您可以在形状为 `(32, 10)`32 个长度为 10 的序列组成的批次)或 `(64, 15)`64 个长度为 15 的序列组成的批次)的批次上方馈入嵌入向量层。
返回的张量比输入多一个轴,嵌入向量沿新的最后一个轴对齐。向其传递 `(2, 3)` 输入批次,输出为 `(2, 3, N)`
```py
result = embedding_layer(tf.constant([[0,1,2],[3,4,5]]))
result.shape
```
```py
TensorShape([2, 3, 5])
```
当给定一个序列批次作为输入时,嵌入向量层将返回形状为 `(samples, sequence_length, embedding_dimensionality)` 的 3D 浮点张量。为了从可变长度的序列转换为固定表示,有多种标准方法。您可以先使用 RNN、注意力或池化层然后再将其传递给密集层。本教程使用池化因为它最简单。接下来学习[使用 RNN 进行文本分类](/tutorials/text/text_classification_rnn)教程是一个不错的选择。
## 从头开始学习嵌入向量
在本教程中,您将基于 IMDB 电影评论来训练情感分类器。在此过程中,模型将从头开始学习嵌入向量。我们将使用经过预处理的数据集。
要从头开始加载文本数据集,请参阅[加载文本教程](https://tensorflow.google.cn/tutorials/load_data/text)。
```py
(train_data, test_data), info = tfds.load(
'imdb_reviews/subwords8k',
split = (tfds.Split.TRAIN, tfds.Split.TEST),
with_info=True, as_supervised=True)
```
```py
WARNING:absl:TFDS datasets with text encoding are deprecated and will be removed in a future version. Instead, you should use the plain text version and tokenize the text using `tensorflow_text` (See: https://www.tensorflow.org/tutorials/tensorflow_text/intro#tfdata_example)
```
获取编码器 (`tfds.features.text.SubwordTextEncoder`),并快速浏览词汇表。
词汇表中的“*”代表空格。请注意词汇表如何包含完整单词(以“*”结尾)以及可用于构建更大单词的部分单词:
```py
encoder = info.features['text'].encoder
encoder.subwords[:20]
```
```py
['the_',
', ',
'. ',
'a_',
'and_',
'of_',
'to_',
's_',
'is_',
'br',
'in_',
'I_',
'that_',
'this_',
'it_',
' /><',
' />',
'was_',
'The_',
'as_']
```
电影评论的长度可以不同。我们将使用 `padded_batch` 方法来标准化评论的长度。
```py
train_batches = train_data.shuffle(1000).padded_batch(10)
test_batches = test_data.shuffle(1000).padded_batch(10)
```
导入时,评论的文本是整数编码的(每个整数代表词汇表中的特定单词或单词部分)。
请注意尾随零,因为批次会填充为最长的示例。
```py
train_batch, train_labels = next(iter(train_batches))
train_batch.numpy()
```
```py
array([[5739, 46, 674, ..., 0, 0, 0],
[ 274, 2732, 1289, ..., 0, 0, 0],
[ 19, 118, 874, ..., 0, 0, 0],
...,
[ 324, 12, 118, ..., 0, 0, 0],
[ 12, 31, 165, ..., 0, 0, 0],
[ 131, 196, 7968, ..., 0, 0, 0]])
```
### 创建一个简单模型
我们将使用 [Keras 序列式 API](https://tensorflow.google.cn/guide/keras) 定义模型。在这种情况下,它是一个“连续词袋”样式的模型。
* 接下来,嵌入向量层将采用整数编码的词汇表,并查找每个单词索引的嵌入向量。在模型训练时会学习这些向量。向量会向输出数组添加维度。得到的维度为:`(batch, sequence, embedding)`
* 接下来通过对序列维度求平均值GlobalAveragePooling1D 层会返回每个样本的固定长度输出向量。这让模型能够以最简单的方式处理可变长度的输入。
* 此固定长度输出向量通过一个包含 16 个隐藏单元的完全连接(密集)层进行流水线传输。
* 最后一层与单个输出节点密集连接。利用 Sigmoid 激活函数,得出此值是 0 到 1 之间的浮点数,表示评论为正面的概率(或置信度)。
小心:此模型不使用遮盖,而是使用零填充作为输入的一部分,因此填充长度可能会影响输出。要解决此问题,请参阅[遮盖和填充指南](https://tensorflow.google.cn/guide/keras/masking_and_padding)。
```py
embedding_dim=16
model = keras.Sequential([
layers.Embedding(encoder.vocab_size, embedding_dim),
layers.GlobalAveragePooling1D(),
layers.Dense(16, activation='relu'),
layers.Dense(1)
])
model.summary()
```
```py
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, None, 16) 130960
_________________________________________________________________
global_average_pooling1d (Gl (None, 16) 0
_________________________________________________________________
dense (Dense) (None, 16) 272
_________________________________________________________________
dense_1 (Dense) (None, 1) 17
=================================================================
Total params: 131,249
Trainable params: 131,249
Non-trainable params: 0
_________________________________________________________________
```
### 编译和训练模型
```py
model.compile(optimizer='adam',
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(
train_batches,
epochs=10,
validation_data=test_batches, validation_steps=20)
```
```py
Epoch 1/10
2500/2500 [==============================] - 10s 4ms/step - loss: 0.4984 - accuracy: 0.7022 - val_loss: 0.3781 - val_accuracy: 0.8550
Epoch 2/10
2500/2500 [==============================] - 10s 4ms/step - loss: 0.2807 - accuracy: 0.8854 - val_loss: 0.3049 - val_accuracy: 0.8600
Epoch 3/10
2500/2500 [==============================] - 10s 4ms/step - loss: 0.2288 - accuracy: 0.9100 - val_loss: 0.3979 - val_accuracy: 0.8550
Epoch 4/10
2500/2500 [==============================] - 10s 4ms/step - loss: 0.1971 - accuracy: 0.9245 - val_loss: 0.4573 - val_accuracy: 0.8500
Epoch 5/10
2500/2500 [==============================] - 10s 4ms/step - loss: 0.1747 - accuracy: 0.9340 - val_loss: 0.3457 - val_accuracy: 0.8550
Epoch 6/10
2500/2500 [==============================] - 10s 4ms/step - loss: 0.1571 - accuracy: 0.9423 - val_loss: 0.4098 - val_accuracy: 0.8550
Epoch 7/10
2500/2500 [==============================] - 10s 4ms/step - loss: 0.1414 - accuracy: 0.9489 - val_loss: 0.4089 - val_accuracy: 0.8550
Epoch 8/10
2500/2500 [==============================] - 10s 4ms/step - loss: 0.1319 - accuracy: 0.9517 - val_loss: 0.5068 - val_accuracy: 0.7900
Epoch 9/10
2500/2500 [==============================] - 10s 4ms/step - loss: 0.1189 - accuracy: 0.9578 - val_loss: 0.4304 - val_accuracy: 0.8500
Epoch 10/10
2500/2500 [==============================] - 10s 4ms/step - loss: 0.1110 - accuracy: 0.9619 - val_loss: 0.6972 - val_accuracy: 0.8250
```
通过这种方法,我们的模型可以达到约 88% 的验证准确率(请注意,该模型过度拟合,因此训练准确率要高得多)。
```py
import matplotlib.pyplot as plt
history_dict = history.history
acc = history_dict['accuracy']
val_acc = history_dict['val_accuracy']
loss=history_dict['loss']
val_loss=history_dict['val_loss']
epochs = range(1, len(acc) + 1)
plt.figure(figsize=(12,9))
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
plt.figure(figsize=(12,9))
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend(loc='lower right')
plt.ylim((0.5,1))
plt.show()
```
![png](img/815371be4cdb93da43df2c0cb17bb929.png)
![png](img/f9f505f9e0bb94757eb576cd0aa1c1f3.png)
## 检索学习的嵌入向量
接下来,我们检索在训练期间学习的单词嵌入向量。这将是一个形状为 `(vocab_size, embedding-dimension)` 的矩阵。
```py
e = model.layers[0]
weights = e.get_weights()[0]
print(weights.shape) # shape: (vocab_size, embedding_dim)
```
```py
(8185, 16)
```
现在,我们将权重写入磁盘。要使用 [Embedding Projector](http://projector.tensorflow.org),我们将以制表符分隔的格式上传两个文件:一个向量文件(包含嵌入向量)和一个元数据文件(包含单词)。
```py
import io
encoder = info.features['text'].encoder
out_v = io.open('vecs.tsv', 'w', encoding='utf-8')
out_m = io.open('meta.tsv', 'w', encoding='utf-8')
for num, word in enumerate(encoder.subwords):
vec = weights[num+1] # skip 0, it's padding.
out_m.write(word + "\n")
out_v.write('\t'.join([str(x) for x in vec]) + "\n")
out_v.close()
out_m.close()
```
如果您正在 [Colaboratory](https://colab.research.google.com) 中运行本教程,则可以使用以下代码段将这些文件下载到本地计算机上(或使用文件浏览器,*View -> Table of contents -> File browser*)。
```py
try:
from google.colab import files
except ImportError:
pass
else:
files.download('vecs.tsv')
files.download('meta.tsv')
```
## 可视化嵌入向量
为了可视化嵌入向量,我们将它们上传到 Embedding Projector。
打开 [Embedding Projector](http://projector.tensorflow.org/)(也可以在本地 TensorBoard 实例中运行)。
* 点击“Load data”。
* 上传我们在上面创建的两个文件:`vecs.tsv``meta.tsv`
现在将显示您已训练的嵌入向量。您可以搜索单词以查找其最邻近。例如尝试搜索“beautiful”您可能会看到“wonderful”等相邻单词。
注:您的结果可能会略有不同,具体取决于训练嵌入向量层之前如何随机初始化权重。
注:您可以试验性地使用更简单的模型来生成更多可解释的嵌入向量。尝试删除 `Dense(16)` 层,重新训练模型,然后再次可视化嵌入向量。
![Screenshot of the embedding projector](img/16ea92d12fa8170f3e79e4c56f9affd1.png)
## 后续步骤
本教程向您展示了如何在小数据集上从头开始训练和可视化单词嵌入向量。
* 要了解循环网络,请参阅 [Keras RNN 指南](https://tensorflow.google.cn/guide/keras/rnn)。
* 要详细了解文本分类(包括整个工作流,以及如果您对何时使用嵌入向量还是独热编码感到好奇),我们建议您阅读这篇实用的文本分类[指南](https://developers.google.cn/machine-learning/guides/text-classification/step-2-5)。

View File

@@ -1,368 +0,0 @@
# 使用 RNN 进行文本分类
> 原文:[https://tensorflow.google.cn/tutorials/text/text_classification_rnn](https://tensorflow.google.cn/tutorials/text/text_classification_rnn)
此文本分类教程将在 [IMDB 大型电影评论数据集](http://ai.stanford.edu/%7Eamaas/data/sentiment/)上训练[循环神经网络](https://developers.google.cn/machine-learning/glossary/#recurrent_neural_network),以进行情感分析。
## 设置
```py
import tensorflow_datasets as tfds
import tensorflow as tf
```
导入 `matplotlib` 并创建一个辅助函数来绘制计算图:
```py
import matplotlib.pyplot as plt
def plot_graphs(history, metric):
plt.plot(history.history[metric])
plt.plot(history.history['val_'+metric], '')
plt.xlabel("Epochs")
plt.ylabel(metric)
plt.legend([metric, 'val_'+metric])
plt.show()
```
## 设置输入流水线
IMDB 大型电影评论数据集是一个*二进制分类*数据集——所有评论都具有*正面*或*负面*情绪。
使用 [TFDS](https://tensorflow.google.cn/datasets) 下载数据集。
```py
dataset, info = tfds.load('imdb_reviews/subwords8k', with_info=True,
as_supervised=True)
train_dataset, test_dataset = dataset['train'], dataset['test']
```
```py
WARNING:absl:TFDS datasets with text encoding are deprecated and will be removed in a future version. Instead, you should use the plain text version and tokenize the text using `tensorflow_text` (See: https://www.tensorflow.org/tutorials/tensorflow_text/intro#tfdata_example)
Downloading and preparing dataset imdb_reviews/subwords8k/1.0.0 (download: 80.23 MiB, generated: Unknown size, total: 80.23 MiB) to /home/kbuilder/tensorflow_datasets/imdb_reviews/subwords8k/1.0.0...
Shuffling and writing examples to /home/kbuilder/tensorflow_datasets/imdb_reviews/subwords8k/1.0.0.incomplete7GBYY4/imdb_reviews-train.tfrecord
Shuffling and writing examples to /home/kbuilder/tensorflow_datasets/imdb_reviews/subwords8k/1.0.0.incomplete7GBYY4/imdb_reviews-test.tfrecord
Shuffling and writing examples to /home/kbuilder/tensorflow_datasets/imdb_reviews/subwords8k/1.0.0.incomplete7GBYY4/imdb_reviews-unsupervised.tfrecord
Dataset imdb_reviews downloaded and prepared to /home/kbuilder/tensorflow_datasets/imdb_reviews/subwords8k/1.0.0\. Subsequent calls will reuse this data.
```
数据集 `info` 包括编码器 (`tfds.features.text.SubwordTextEncoder`)。
```py
encoder = info.features['text'].encoder
```
```py
print('Vocabulary size: {}'.format(encoder.vocab_size))
```
```py
Vocabulary size: 8185
```
此文本编码器将以可逆方式对任何字符串进行编码,并在必要时退回到字节编码。
```py
sample_string = 'Hello TensorFlow.'
encoded_string = encoder.encode(sample_string)
print('Encoded string is {}'.format(encoded_string))
original_string = encoder.decode(encoded_string)
print('The original string: "{}"'.format(original_string))
```
```py
Encoded string is [4025, 222, 6307, 2327, 4043, 2120, 7975]
The original string: "Hello TensorFlow."
```
```py
assert original_string == sample_string
```
```py
for index in encoded_string:
print('{} ----&gt; {}'.format(index, encoder.decode([index])))
```
```py
4025 ----&gt; Hell
222 ----&gt; o
6307 ----&gt; Ten
2327 ----&gt; sor
4043 ----&gt; Fl
2120 ----&gt; ow
7975 ----&gt; .
```
## 准备用于训练的数据
接下来,创建这些编码字符串的批次。使用 `padded_batch` 方法将序列零填充至批次中最长字符串的长度:
```py
BUFFER_SIZE = 10000
BATCH_SIZE = 64
```
```py
train_dataset = train_dataset.shuffle(BUFFER_SIZE)
train_dataset = train_dataset.padded_batch(BATCH_SIZE)
test_dataset = test_dataset.padded_batch(BATCH_SIZE)
```
## 创建模型
构建一个 [`tf.keras.Sequential`](https://tensorflow.google.cn/api_docs/python/tf/keras/Sequential) 模型并从嵌入向量层开始。嵌入向量层每个单词存储一个向量。调用时,它会将单词索引序列转换为向量序列。这些向量是可训练的。(在足够的数据上)训练后,具有相似含义的单词通常具有相似的向量。
与通过 [`tf.keras.layers.Dense`](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/Dense) 层传递独热编码向量的等效运算相比,这种索引查找方法要高效得多。
循环神经网络 (RNN) 通过遍历元素来处理序列输入。RNN 将输出从一个时间步骤传递到其输入,然后传递到下一个步骤。
[`tf.keras.layers.Bidirectional`](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/Bidirectional) 包装器也可以与 RNN 层一起使用。这将通过 RNN 层向前和向后传播输入,然后连接输出。这有助于 RNN 学习长程依赖关系。
```py
model = tf.keras.Sequential([
tf.keras.layers.Embedding(encoder.vocab_size, 64),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1)
])
```
请注意,我们在这里选择 Keras 序贯模型,因为模型中的所有层都只有单个输入并产生单个输出。如果要使用有状态 RNN 层,则可能需要使用 Keras 函数式 API 或模型子类化来构建模型,以便可以检索和重用 RNN 层状态。有关更多详细信息,请参阅 [Keras RNN 指南](https://tensorflow.google.cn/guide/keras/rnn#rnn_state_reuse)。
编译 Keras 模型以配置训练过程:
```py
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
optimizer=tf.keras.optimizers.Adam(1e-4),
metrics=['accuracy'])
```
## 训练模型
```py
history = model.fit(train_dataset, epochs=10,
validation_data=test_dataset,
validation_steps=30)
```
```py
Epoch 1/10
391/391 [==============================] - 41s 105ms/step - loss: 0.6363 - accuracy: 0.5736 - val_loss: 0.4592 - val_accuracy: 0.8010
Epoch 2/10
391/391 [==============================] - 41s 105ms/step - loss: 0.3426 - accuracy: 0.8556 - val_loss: 0.3710 - val_accuracy: 0.8417
Epoch 3/10
391/391 [==============================] - 42s 107ms/step - loss: 0.2520 - accuracy: 0.9047 - val_loss: 0.3444 - val_accuracy: 0.8719
Epoch 4/10
391/391 [==============================] - 41s 105ms/step - loss: 0.2103 - accuracy: 0.9228 - val_loss: 0.3348 - val_accuracy: 0.8625
Epoch 5/10
391/391 [==============================] - 42s 106ms/step - loss: 0.1803 - accuracy: 0.9360 - val_loss: 0.3591 - val_accuracy: 0.8552
Epoch 6/10
391/391 [==============================] - 42s 106ms/step - loss: 0.1589 - accuracy: 0.9450 - val_loss: 0.4146 - val_accuracy: 0.8635
Epoch 7/10
391/391 [==============================] - 41s 105ms/step - loss: 0.1466 - accuracy: 0.9505 - val_loss: 0.3780 - val_accuracy: 0.8484
Epoch 8/10
391/391 [==============================] - 41s 106ms/step - loss: 0.1463 - accuracy: 0.9485 - val_loss: 0.4074 - val_accuracy: 0.8156
Epoch 9/10
391/391 [==============================] - 41s 106ms/step - loss: 0.1327 - accuracy: 0.9555 - val_loss: 0.4608 - val_accuracy: 0.8589
Epoch 10/10
391/391 [==============================] - 41s 105ms/step - loss: 0.1666 - accuracy: 0.9404 - val_loss: 0.4364 - val_accuracy: 0.8422
```
```py
test_loss, test_acc = model.evaluate(test_dataset)
print('Test Loss: {}'.format(test_loss))
print('Test Accuracy: {}'.format(test_acc))
```
```py
391/391 [==============================] - 17s 43ms/step - loss: 0.4305 - accuracy: 0.8477
Test Loss: 0.43051090836524963
Test Accuracy: 0.8476799726486206
```
上面的模型没有遮盖应用于序列的填充。如果在填充序列上进行训练并在未填充序列上进行测试,则可能导致倾斜。理想情况下,您可以[使用遮盖](https://tensorflow.google.cn/guide/keras/masking_and_padding)来避免这种情况,但是正如您在下面看到的那样,它只会对输出产生很小的影响。
如果预测 >= 0.5,则为正,否则为负。
```py
def pad_to_size(vec, size):
zeros = [0] * (size - len(vec))
vec.extend(zeros)
return vec
```
```py
def sample_predict(sample_pred_text, pad):
encoded_sample_pred_text = encoder.encode(sample_pred_text)
if pad:
encoded_sample_pred_text = pad_to_size(encoded_sample_pred_text, 64)
encoded_sample_pred_text = tf.cast(encoded_sample_pred_text, tf.float32)
predictions = model.predict(tf.expand_dims(encoded_sample_pred_text, 0))
return (predictions)
```
```py
# predict on a sample text without padding.
sample_pred_text = ('The movie was cool. The animation and the graphics '
'were out of this world. I would recommend this movie.')
predictions = sample_predict(sample_pred_text, pad=False)
print(predictions)
```
```py
[[-0.11829309]]
```
```py
# predict on a sample text with padding
sample_pred_text = ('The movie was cool. The animation and the graphics '
'were out of this world. I would recommend this movie.')
predictions = sample_predict(sample_pred_text, pad=True)
print(predictions)
```
```py
[[-1.162545]]
```
```py
plot_graphs(history, 'accuracy')
```
![png](img/267bdfdd72740285a56d6dbc3f34c679.png)
```py
plot_graphs(history, 'loss')
```
![png](img/ae60ced5a9a18ef2a947912ada799ca0.png)
## 堆叠两个或更多 LSTM 层
Keras 循环层有两种可用的模式,这些模式由 `return_sequences` 构造函数参数控制:
* 返回每个时间步骤的连续输出的完整序列(形状为 `(batch_size, timesteps, output_features)` 的 3D 张量)。
* 仅返回每个输入序列的最后一个输出(形状为 (batch_size, output_features) 的 2D 张量)。
```py
model = tf.keras.Sequential([
tf.keras.layers.Embedding(encoder.vocab_size, 64),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, return_sequences=True)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(1)
])
```
```py
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
optimizer=tf.keras.optimizers.Adam(1e-4),
metrics=['accuracy'])
```
```py
history = model.fit(train_dataset, epochs=10,
validation_data=test_dataset,
validation_steps=30)
```
```py
Epoch 1/10
391/391 [==============================] - 75s 192ms/step - loss: 0.6484 - accuracy: 0.5630 - val_loss: 0.4876 - val_accuracy: 0.7464
Epoch 2/10
391/391 [==============================] - 74s 190ms/step - loss: 0.3603 - accuracy: 0.8528 - val_loss: 0.3533 - val_accuracy: 0.8490
Epoch 3/10
391/391 [==============================] - 75s 191ms/step - loss: 0.2666 - accuracy: 0.9018 - val_loss: 0.3393 - val_accuracy: 0.8703
Epoch 4/10
391/391 [==============================] - 75s 193ms/step - loss: 0.2151 - accuracy: 0.9267 - val_loss: 0.3451 - val_accuracy: 0.8604
Epoch 5/10
391/391 [==============================] - 76s 194ms/step - loss: 0.1806 - accuracy: 0.9422 - val_loss: 0.3687 - val_accuracy: 0.8708
Epoch 6/10
391/391 [==============================] - 75s 193ms/step - loss: 0.1623 - accuracy: 0.9495 - val_loss: 0.3836 - val_accuracy: 0.8594
Epoch 7/10
391/391 [==============================] - 76s 193ms/step - loss: 0.1382 - accuracy: 0.9598 - val_loss: 0.4173 - val_accuracy: 0.8573
Epoch 8/10
391/391 [==============================] - 76s 194ms/step - loss: 0.1227 - accuracy: 0.9664 - val_loss: 0.4586 - val_accuracy: 0.8542
Epoch 9/10
391/391 [==============================] - 76s 194ms/step - loss: 0.0997 - accuracy: 0.9749 - val_loss: 0.4939 - val_accuracy: 0.8547
Epoch 10/10
391/391 [==============================] - 76s 194ms/step - loss: 0.0973 - accuracy: 0.9748 - val_loss: 0.5222 - val_accuracy: 0.8526
```
```py
test_loss, test_acc = model.evaluate(test_dataset)
print('Test Loss: {}'.format(test_loss))
print('Test Accuracy: {}'.format(test_acc))
```
```py
391/391 [==============================] - 30s 78ms/step - loss: 0.5205 - accuracy: 0.8572
Test Loss: 0.5204932689666748
Test Accuracy: 0.857200026512146
```
```py
# predict on a sample text without padding.
sample_pred_text = ('The movie was not good. The animation and the graphics '
'were terrible. I would not recommend this movie.')
predictions = sample_predict(sample_pred_text, pad=False)
print(predictions)
```
```py
[[-2.6377363]]
```
```py
# predict on a sample text with padding
sample_pred_text = ('The movie was not good. The animation and the graphics '
'were terrible. I would not recommend this movie.')
predictions = sample_predict(sample_pred_text, pad=True)
print(predictions)
```
```py
[[-3.0502243]]
```
```py
plot_graphs(history, 'accuracy')
```
![png](img/ee3ae6c62d5acf6adfea6458312bcb02.png)
```py
plot_graphs(history, 'loss')
```
![png](img/f2f53e7a4522a77ce6e821a299a77c76.png)
检查其他现有循环层,例如 [GRU 层](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/GRU)。
如果您对构建自定义 RNN 感兴趣,请参阅 [Keras RNN 指南](https://tensorflow.google.cn/guide/keras/rnn)。

View File

@@ -1,841 +0,0 @@
# 循环神经网络RNN文本生成
> 原文:[https://tensorflow.google.cn/tutorials/text/text_generation](https://tensorflow.google.cn/tutorials/text/text_generation)
<devsite-mathjax config="TeX-AMS-MML_SVG"></devsite-mathjax>
本教程演示如何使用基于字符的 RNN 生成文本。我们将使用 Andrej Karpathy 在[《循环神经网络不合理的有效性》](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)一文中提供的莎士比亚作品数据集。给定此数据中的一个字符序列 “Shakespear”训练一个模型以预测该序列的下一个字符“e”。通过重复调用该模型可以生成更长的文本序列。
请注意:启用 GPU 加速可以更快地执行此笔记本。在 Colab 中依次选择:*运行时 > 更改运行时类型 > 硬件加速器 > GPU*。如果在本地运行,请确保 TensorFlow 的版本为 1.11 或更高。
本教程包含使用 [tf.keras](https://tensorflow.google.cn/programmers_guide/keras) 和 [eager execution](https://tensorflow.google.cn/programmers_guide/eager) 实现的可运行代码。以下是当本教程中的模型训练 30 个周期 epoch并以字符串 “Q” 开头时的示例输出:
```py
QUEENE:
I had thought thou hadst a Roman; for the oracle,
Thus by All bids the man against the word,
Which are so weak of care, by old care done;
Your children were in your holy love,
And the precipitation through the bleeding throne.
BISHOP OF ELY:
Marry, and will, my lord, to weep in such a one were prettiest;
Yet now I was adopted heir
Of the world's lamentable day,
To watch the next way with his father with his face?
ESCALUS:
The cause why then we are all resolved more sons.
VOLUMNIA:
O, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, it is no sin it should be dead,
And love and pale as any will to that word.
QUEEN ELIZABETH:
But how long have I heard the soul for this world,
And show his hands of life be proved to stand.
PETRUCHIO:
I say he look'd on, if I must be content
To stay him from the fatal of our country's bliss.
His lordship pluck'd from this sentence then for prey,
And then let us twain, being the moon,
were she such a case as fills m
```
虽然有些句子符合语法规则,但是大多数句子没有意义。这个模型尚未学习到单词的含义,但请考虑以下几点:
* 此模型是基于字符的。训练开始时,模型不知道如何拼写一个英文单词,甚至不知道单词是文本的一个单位。
* 输出文本的结构类似于剧本 -- 文本块通常以讲话者的名字开始;而且与数据集类似,讲话者的名字采用全大写字母。
* 如下文所示,此模型由小批次 batch 文本训练而成(每批 100 个字符)。即便如此,此模型仍然能生成更长的文本序列,并且结构连贯。
## 设置
### 导入 TensorFlow 和其他库
```py
import tensorflow as tf
import numpy as np
import os
import time
```
### 下载莎士比亚数据集
修改下面一行代码,在你自己的数据上运行此代码。
```py
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')
```
```py
Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt
1122304/1115394 [==============================] - 0s 0us/step
```
### 读取数据
首先,看一看文本:
```py
# 读取并为 py2 compat 解码
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# 文本长度是指文本中的字符个数
print ('Length of text: {} characters'.format(len(text)))
```
```py
Length of text: 1115394 characters
```
```py
# 看一看文本中的前 250 个字符
print(text[:250])
```
```py
First Citizen:
Before we proceed any further, hear me speak.
All:
Speak, speak.
First Citizen:
You are all resolved rather to die than to famish?
All:
Resolved. resolved.
First Citizen:
First, you know Caius Marcius is chief enemy to the people.
```
```py
# 文本中的非重复字符
vocab = sorted(set(text))
print ('{} unique characters'.format(len(vocab)))
```
```py
65 unique characters
```
## 处理文本
### 向量化文本
在训练之前,我们需要将字符串映射到数字表示值。创建两个查找表格:一个将字符映射到数字,另一个将数字映射到字符。
```py
# 创建从非重复字符到索引的映射
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)
text_as_int = np.array([char2idx[c] for c in text])
```
现在,每个字符都有一个整数表示值。请注意,我们将字符映射至索引 0 至 `len(unique)`.
```py
print('{')
for char,_ in zip(char2idx, range(20)):
print(' {:4s}: {:3d},'.format(repr(char), char2idx[char]))
print(' ...\n}')
```
```py
{
'\n': 0,
' ' : 1,
'!' : 2,
'$' : 3,
'&' : 4,
"'" : 5,
',' : 6,
'-' : 7,
'.' : 8,
'3' : 9,
':' : 10,
';' : 11,
'?' : 12,
'A' : 13,
'B' : 14,
'C' : 15,
'D' : 16,
'E' : 17,
'F' : 18,
'G' : 19,
...
}
```
```py
# 显示文本首 13 个字符的整数映射
print ('{} ---- characters mapped to int ---- > {}'.format(repr(text[:13]), text_as_int[:13]))
```
```py
'First Citizen' ---- characters mapped to int ---- > [18 47 56 57 58 1 15 47 58 47 64 43 52]
```
### 预测任务
给定一个字符或者一个字符序列,下一个最可能出现的字符是什么?这就是我们训练模型要执行的任务。输入进模型的是一个字符序列,我们训练这个模型来预测输出 -- 每个时间步time step预测下一个字符是什么。
由于 RNN 是根据前面看到的元素维持内部状态,那么,给定此时计算出的所有字符,下一个字符是什么?
### 创建训练样本和目标
接下来,将文本划分为样本序列。每个输入序列包含文本中的 `seq_length` 个字符。
对于每个输入序列,其对应的目标包含相同长度的文本,但是向右顺移一个字符。
将文本拆分为长度为 `seq_length+1` 的文本块。例如,假设 `seq_length` 为 4 而且文本为 “Hello” 那么输入序列将为 “Hell”目标序列将为 “ello”。
为此,首先使用 [`tf.data.Dataset.from_tensor_slices`](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#from_tensor_slices) 函数把文本向量转换为字符索引流。
```py
# 设定每个输入句子长度的最大值
seq_length = 100
examples_per_epoch = len(text)//seq_length
# 创建训练样本 / 目标
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)
for i in char_dataset.take(5):
print(idx2char[i.numpy()])
```
```py
F
i
r
s
t
```
`batch` 方法使我们能轻松把单个字符转换为所需长度的序列。
```py
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)
for item in sequences.take(5):
print(repr(''.join(idx2char[item.numpy()])))
```
```py
'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '
'are all resolved rather to die than to famish?\n\nAll:\nResolved. resolved.\n\nFirst Citizen:\nFirst, you k'
"now Caius Marcius is chief enemy to the people.\n\nAll:\nWe know't, we know't.\n\nFirst Citizen:\nLet us ki"
"ll him, and we'll have corn at our own price.\nIs't a verdict?\n\nAll:\nNo more talking on't; let it be d"
'one: away, away!\n\nSecond Citizen:\nOne word, good citizens.\n\nFirst Citizen:\nWe are accounted poor citi'
```
对于每个序列,使用 `map` 方法先复制再顺移,以创建输入文本和目标文本。`map` 方法可以将一个简单的函数应用到每一个批次 batch
```py
def split_input_target(chunk):
input_text = chunk[:-1]
target_text = chunk[1:]
return input_text, target_text
dataset = sequences.map(split_input_target)
```
打印第一批样本的输入与目标值:
```py
for input_example, target_example in dataset.take(1):
print ('Input data: ', repr(''.join(idx2char[input_example.numpy()])))
print ('Target data:', repr(''.join(idx2char[target_example.numpy()])))
```
```py
Input data: 'First Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou'
Target data: 'irst Citizen:\nBefore we proceed any further, hear me speak.\n\nAll:\nSpeak, speak.\n\nFirst Citizen:\nYou '
```
这些向量的每个索引均作为一个时间步来处理。作为时间步 0 的输入,模型接收到 “F” 的索引,并尝试预测 “i” 的索引为下一个字符。在下一个时间步,模型执行相同的操作,但是 `RNN` 不仅考虑当前的输入字符,还会考虑上一步的信息。
```py
for i, (input_idx, target_idx) in enumerate(zip(input_example[:5], target_example[:5])):
print("Step {:4d}".format(i))
print(" input: {} ({:s})".format(input_idx, repr(idx2char[input_idx])))
print(" expected output: {} ({:s})".format(target_idx, repr(idx2char[target_idx])))
```
```py
Step 0
input: 18 ('F')
expected output: 47 ('i')
Step 1
input: 47 ('i')
expected output: 56 ('r')
Step 2
input: 56 ('r')
expected output: 57 ('s')
Step 3
input: 57 ('s')
expected output: 58 ('t')
Step 4
input: 58 ('t')
expected output: 1 (' ')
```
### 创建训练批次
前面我们使用 [`tf.data`](https://tensorflow.google.cn/api_docs/python/tf/data) 将文本拆分为可管理的序列。但是在把这些数据输送至模型之前,我们需要将数据重新排列 shuffle 并打包为批次。
```py
# 批大小
BATCH_SIZE = 64
# 设定缓冲区大小,以重新排列数据集
# TF 数据被设计为可以处理可能是无限的序列,
# 所以它不会试图在内存中重新排列整个序列。相反,
# 它维持一个缓冲区,在缓冲区重新排列元素。)
BUFFER_SIZE = 10000
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
dataset
```
```py
<BatchDataset shapes: ((64, 100), (64, 100)), types: (tf.int64, tf.int64)>
```
## 创建模型
使用 [`tf.keras.Sequential`](https://tensorflow.google.cn/api_docs/python/tf/keras/Sequential) 定义模型。在这个简单的例子中,我们使用了三个层来定义模型:
* [`tf.keras.layers.Embedding`](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/Embedding):输入层。一个可训练的对照表,它会将每个字符的数字映射到一个 `embedding_dim` 维度的向量。
* [`tf.keras.layers.GRU`](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/GRU):一种 RNN 类型,其大小由 `units=rnn_units` 指定(这里你也可以使用一个 LSTM 层)。
* [`tf.keras.layers.Dense`](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/Dense):输出层,带有 `vocab_size` 个输出。
```py
# 词集的长度
vocab_size = len(vocab)
# 嵌入的维度
embedding_dim = 256
# RNN 的单元数量
rnn_units = 1024
```
```py
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim,
batch_input_shape=[batch_size, None]),
tf.keras.layers.GRU(rnn_units,
return_sequences=True,
stateful=True,
recurrent_initializer='glorot_uniform'),
tf.keras.layers.Dense(vocab_size)
])
return model
```
```py
model = build_model(
vocab_size = len(vocab),
embedding_dim=embedding_dim,
rnn_units=rnn_units,
batch_size=BATCH_SIZE)
```
对于每个字符,模型会查找嵌入,把嵌入当作输入运行 GRU 一个时间步,并用密集层生成逻辑回归 logits预测下一个字符的对数可能性。 ![数据在模型中传输的示意图](img/643d654e7e1e3d928041b42363e0f099.png)
## 试试这个模型
现在运行这个模型,看看它是否按预期运行。
首先检查输出的形状:
```py
for input_example_batch, target_example_batch in dataset.take(1):
example_batch_predictions = model(input_example_batch)
print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")
```
```py
(64, 100, 65) # (batch_size, sequence_length, vocab_size)
```
在上面的例子中,输入的序列长度为 `100` 但是这个模型可以在任何长度的输入上运行:
```py
model.summary()
```
```py
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (64, None, 256) 16640
_________________________________________________________________
gru (GRU) (64, None, 1024) 3938304
_________________________________________________________________
dense (Dense) (64, None, 65) 66625
=================================================================
Total params: 4,021,569
Trainable params: 4,021,569
Non-trainable params: 0
_________________________________________________________________
```
为了获得模型的实际预测,我们需要从输出分布中抽样,以获得实际的字符索引。这个分布是根据对字符集的逻辑回归定义的。
请注意:从这个分布中 *抽样* 很重要,因为取分布的 *最大值自变量点集argmax* 很容易使模型卡在循环中。
试试这个批次中的第一个样本:
```py
sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)
sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()
```
这使我们得到每个时间步预测的下一个字符的索引。
```py
sampled_indices
```
```py
array([ 3, 19, 11, 8, 17, 50, 14, 5, 16, 57, 51, 53, 17, 54, 9, 11, 22,
13, 36, 57, 57, 50, 47, 22, 5, 7, 1, 59, 3, 26, 52, 2, 62, 30,
54, 18, 62, 9, 63, 2, 22, 11, 18, 12, 63, 0, 13, 16, 38, 49, 21,
25, 22, 53, 39, 63, 3, 26, 39, 15, 21, 56, 49, 39, 20, 55, 5, 39,
61, 29, 21, 39, 39, 63, 48, 11, 27, 42, 59, 0, 19, 58, 57, 27, 40,
13, 53, 13, 7, 4, 21, 32, 10, 57, 18, 30, 54, 36, 12, 3])
```
解码它们,以查看此未经训练的模型预测的文本:
```py
print("Input: \n", repr("".join(idx2char[input_example_batch[0]])))
print()
print("Next Char Predictions: \n", repr("".join(idx2char[sampled_indices ])))
```
```py
Input:
'e, I say! madam! sweet-heart! why, bride!\nWhat, not a word? you take your pennyworths now;\nSleep for'
Next Char Predictions:
"$G;.ElB'DsmoEp3;JAXssliJ'- u$Nn!xRpFx3y!J;F?y\nADZkIMJoay$NaCIrkaHq'awQIaayj;Odu\nGtsObAoA-&IT:sFRpX?$"
```
## 训练模型
此时,这个问题可以被视为一个标准的分类问题:给定先前的 RNN 状态和这一时间步的输入,预测下一个字符的类别。
### 添加优化器和损失函数
标准的 [`tf.keras.losses.sparse_categorical_crossentropy`](https://tensorflow.google.cn/api_docs/python/tf/keras/losses/sparse_categorical_crossentropy) 损失函数在这里适用,因为它被应用于预测的最后一个维度。
因为我们的模型返回逻辑回归,所以我们需要设定命令行参数 `from_logits`
```py
def loss(labels, logits):
return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)
example_batch_loss = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("scalar_loss: ", example_batch_loss.numpy().mean())
```
```py
Prediction shape: (64, 100, 65) # (batch_size, sequence_length, vocab_size)
scalar_loss: 4.1736827
```
使用 [`tf.keras.Model.compile`](https://tensorflow.google.cn/api_docs/python/tf/keras/Model#compile) 方法配置训练步骤。我们将使用 [`tf.keras.optimizers.Adam`](https://tensorflow.google.cn/api_docs/python/tf/keras/optimizers/Adam) 并采用默认参数,以及损失函数。
```py
model.compile(optimizer='adam', loss=loss)
```
### 配置检查点
使用 [`tf.keras.callbacks.ModelCheckpoint`](https://tensorflow.google.cn/api_docs/python/tf/keras/callbacks/ModelCheckpoint) 来确保训练过程中保存检查点。
```py
# 检查点保存至的目录
checkpoint_dir = './training_checkpoints'
# 检查点的文件名
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")
checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_prefix,
save_weights_only=True)
```
### 执行训练
为保持训练时间合理,使用 10 个周期来训练模型。在 Colab 中,将运行时设置为 GPU 以加速训练。
```py
EPOCHS=10
```
```py
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])
```
```py
Epoch 1/10
172/172 [==============================] - 5s 27ms/step - loss: 2.6663
Epoch 2/10
172/172 [==============================] - 5s 27ms/step - loss: 1.9452
Epoch 3/10
172/172 [==============================] - 5s 27ms/step - loss: 1.6797
Epoch 4/10
172/172 [==============================] - 5s 27ms/step - loss: 1.5355
Epoch 5/10
172/172 [==============================] - 5s 27ms/step - loss: 1.4493
Epoch 6/10
172/172 [==============================] - 5s 27ms/step - loss: 1.3900
Epoch 7/10
172/172 [==============================] - 5s 27ms/step - loss: 1.3457
Epoch 8/10
172/172 [==============================] - 5s 26ms/step - loss: 1.3076
Epoch 9/10
172/172 [==============================] - 5s 27ms/step - loss: 1.2732
Epoch 10/10
172/172 [==============================] - 5s 27ms/step - loss: 1.2412
```
## 生成文本
### 恢复最新的检查点
为保持此次预测步骤简单,将批大小设定为 1。
由于 RNN 状态从时间步传递到时间步的方式,模型建立好之后只接受固定的批大小。
若要使用不同的 `batch_size` 来运行模型,我们需要重建模型并从检查点中恢复权重。
```py
tf.train.latest_checkpoint(checkpoint_dir)
```
```py
'./training_checkpoints/ckpt_10'
```
```py
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
model.build(tf.TensorShape([1, None]))
```
```py
model.summary()
```
```py
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (1, None, 256) 16640
_________________________________________________________________
gru_1 (GRU) (1, None, 1024) 3938304
_________________________________________________________________
dense_1 (Dense) (1, None, 65) 66625
=================================================================
Total params: 4,021,569
Trainable params: 4,021,569
Non-trainable params: 0
_________________________________________________________________
```
### 预测循环
下面的代码块生成文本:
* 首先设置起始字符串,初始化 RNN 状态并设置要生成的字符个数。
* 用起始字符串和 RNN 状态,获取下一个字符的预测分布。
* 然后,用分类分布计算预测字符的索引。把这个预测字符当作模型的下一个输入。
* 模型返回的 RNN 状态被输送回模型。现在,模型有更多上下文可以学习,而非只有一个字符。在预测出下一个字符后,更改过的 RNN 状态被再次输送回模型。模型就是这样,通过不断从前面预测的字符获得更多上下文,进行学习。
![为生成文本,模型的输出被输送回模型作为输入](img/6ae78bb4c1ad3a2e0ade4489d4fdf706.png)
查看生成的文本,你会发现这个模型知道什么时候使用大写字母,什么时候分段,而且模仿出了莎士比亚式的词汇。由于训练的周期小,模型尚未学会生成连贯的句子。
```py
def generate_text(model, start_string):
# 评估步骤(用学习过的模型生成文本)
# 要生成的字符个数
num_generate = 1000
# 将起始字符串转换为数字(向量化)
input_eval = [char2idx[s] for s in start_string]
input_eval = tf.expand_dims(input_eval, 0)
# 空字符串用于存储结果
text_generated = []
# 低温度会生成更可预测的文本
# 较高温度会生成更令人惊讶的文本
# 可以通过试验以找到最好的设定
temperature = 1.0
# 这里批大小为 1
model.reset_states()
for i in range(num_generate):
predictions = model(input_eval)
# 删除批次的维度
predictions = tf.squeeze(predictions, 0)
# 用分类分布预测模型返回的字符
predictions = predictions / temperature
predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()
# 把预测字符和前面的隐藏状态一起传递给模型作为下一个输入
input_eval = tf.expand_dims([predicted_id], 0)
text_generated.append(idx2char[predicted_id])
return (start_string + ''.join(text_generated))
```
```py
print(generate_text(model, start_string=u"ROMEO: "))
```
```py
ROMEO: in't, Romeo rather
say, bid me not say, the adden, and you man for all.
Now, good Cart, or do held. Well, leaving her son,
Some stomacame, brother, Edommen.
PROSPERO:
My lord Hastings, for death,
Or as believell you be accoment.
TRANIO:
Mistraising? come, get abseng house:
The that was a life upon none of the equard sud,
Great Aufidius any joy;
For well a fool, and loveth one stay,
To whom Gare his moved me of Marcius shoulded.
Pite o'erposens to him.
KING RICHARD II:
Come, civil and live, if wet to help and raisen fellow.
CORIOLANUS:
Mark, here, sir. But the palace-hate will be at him in
some wondering danger, my bestilent.
DUKE OF AUMERLE:
You, my lord? my dearly uncles for,
If't be fown'd for truth enough not him,
He talk of youngest young princely sake.
ROMEO:
This let me have a still before the queen
First worthy angel. Would yes, by return.
BAPTISTA:
You have dan,
Dies, renown awrifes; I'll say you.
Provost:
And, come, make it out.
LEONTES:
They call thee, hangions,
Not
```
若想改进结果,最简单的方式是延长训练时间 (试试 `EPOCHS=30`)。
你还可以试验使用不同的起始字符串,或者尝试增加另一个 RNN 层以提高模型的准确率,亦或调整温度参数以生成更多或者更少的随机预测。
## 高级:自定义训练
上面的训练步骤简单,但是能控制的地方不多。
至此,你已经知道如何手动运行模型。现在,让我们打开训练循环,并自己实现它。这是一些任务的起点,例如实现 *课程学习* 以帮助稳定模型的开环输出。
你将使用 [`tf.GradientTape`](https://tensorflow.google.cn/api_docs/python/tf/GradientTape) 跟踪梯度。关于此方法的更多信息请参阅 [eager execution 指南](https://tensorflow.google.cn/guide/eager)。
步骤如下:
* 首先,初始化 RNN 状态,使用 [`tf.keras.Model.reset_states`](https://tensorflow.google.cn/api_docs/python/tf/keras/Model#reset_states) 方法。
* 然后,迭代数据集(逐批次)并计算每次迭代对应的 *预测*
* 打开一个 [`tf.GradientTape`](https://tensorflow.google.cn/api_docs/python/tf/GradientTape) 并计算该上下文时的预测和损失。
* 使用 `tf.GradientTape.grads` 方法,计算当前模型变量情况下的损失梯度。
* 最后,使用优化器的 `tf.train.Optimizer.apply_gradients` 方法向下迈出一步。
```py
model = build_model(
vocab_size = len(vocab),
embedding_dim=embedding_dim,
rnn_units=rnn_units,
batch_size=BATCH_SIZE)
```
```py
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.iter
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.beta_1
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.beta_2
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.decay
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer.learning_rate
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-0.embeddings
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-2.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-2.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-1.cell.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-1.cell.recurrent_kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-1.cell.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-0.embeddings
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-2.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-2.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-1.cell.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-1.cell.recurrent_kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-1.cell.bias
WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.
```
```py
optimizer = tf.keras.optimizers.Adam()
```
```py
@tf.function
def train_step(inp, target):
with tf.GradientTape() as tape:
predictions = model(inp)
loss = tf.reduce_mean(
tf.keras.losses.sparse_categorical_crossentropy(
target, predictions, from_logits=True))
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
return loss
```
```py
# 训练步骤
EPOCHS = 10
for epoch in range(EPOCHS):
start = time.time()
# 在每个训练周期开始时,初始化隐藏状态
# 隐藏状态最初为 None
hidden = model.reset_states()
for (batch_n, (inp, target)) in enumerate(dataset):
loss = train_step(inp, target)
if batch_n % 100 == 0:
template = 'Epoch {} Batch {} Loss {}'
print(template.format(epoch+1, batch_n, loss))
# 每 5 个训练周期保存检查点1 次模型
if (epoch + 1) % 5 == 0:
model.save_weights(checkpoint_prefix.format(epoch=epoch))
print ('Epoch {} Loss {:.4f}'.format(epoch+1, loss))
print ('Time taken for 1 epoch {} sec\n'.format(time.time() - start))
model.save_weights(checkpoint_prefix.format(epoch=epoch))
```
```py
Epoch 1 Batch 0 Loss 4.173541069030762
Epoch 1 Batch 100 Loss 2.3451342582702637
Epoch 1 Loss 2.1603
Time taken for 1 epoch 6.5293896198272705 sec
Epoch 2 Batch 0 Loss 2.1137943267822266
Epoch 2 Batch 100 Loss 1.9266924858093262
Epoch 2 Loss 1.7417
Time taken for 1 epoch 5.6192779541015625 sec
Epoch 3 Batch 0 Loss 1.775771975517273
Epoch 3 Batch 100 Loss 1.657868504524231
Epoch 3 Loss 1.5520
Time taken for 1 epoch 5.231291770935059 sec
Epoch 4 Batch 0 Loss 1.543768048286438
Epoch 4 Batch 100 Loss 1.5487240552902222
Epoch 4 Loss 1.4920
Time taken for 1 epoch 5.363192319869995 sec
Epoch 5 Batch 0 Loss 1.4550749063491821
Epoch 5 Batch 100 Loss 1.4589751958847046
Epoch 5 Loss 1.4171
Time taken for 1 epoch 5.297640085220337 sec
Epoch 6 Batch 0 Loss 1.376267671585083
Epoch 6 Batch 100 Loss 1.3637677431106567
Epoch 6 Loss 1.3818
Time taken for 1 epoch 5.299052476882935 sec
Epoch 7 Batch 0 Loss 1.2916797399520874
Epoch 7 Batch 100 Loss 1.3284915685653687
Epoch 7 Loss 1.3983
Time taken for 1 epoch 5.277729749679565 sec
Epoch 8 Batch 0 Loss 1.2573177814483643
Epoch 8 Batch 100 Loss 1.2979872226715088
Epoch 8 Loss 1.3120
Time taken for 1 epoch 5.250093460083008 sec
Epoch 9 Batch 0 Loss 1.3046417236328125
Epoch 9 Batch 100 Loss 1.2858468294143677
Epoch 9 Loss 1.3266
Time taken for 1 epoch 5.280868291854858 sec
Epoch 10 Batch 0 Loss 1.1859409809112549
Epoch 10 Batch 100 Loss 1.2690430879592896
Epoch 10 Loss 1.2733
Time taken for 1 epoch 5.34737491607666 sec
```

View File

@@ -1,716 +0,0 @@
# 基于注意力的神经机器翻译
> 原文:[https://tensorflow.google.cn/tutorials/text/nmt_with_attention](https://tensorflow.google.cn/tutorials/text/nmt_with_attention)
**Note:** 我们的 TensorFlow 社区翻译了这些文档。因为社区翻译是尽力而为, 所以无法保证它们是最准确的,并且反映了最新的 [官方英文文档](https://tensorflow.google.cn/?hl=en)。如果您有改进此翻译的建议, 请提交 pull request 到 [tensorflow/docs](https://github.com/tensorflow/docs) GitHub 仓库。要志愿地撰写或者审核译文,请加入 [docs-zh-cn@tensorflow.org Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs-zh-cn)。
此笔记本训练一个将西班牙语翻译为英语的序列到序列sequence to sequence简写为 seq2seq模型。此例子难度较高需要对序列到序列模型的知识有一定了解。
训练完此笔记本中的模型后,你将能够输入一个西班牙语句子,例如 *"¿todavia estan en casa?"*,并返回其英语翻译 *"are you still at home?"*
对于一个简单的例子来说,翻译质量令人满意。但是更有趣的可能是生成的注意力图:它显示在翻译过程中,输入句子的哪些部分受到了模型的注意。
![spanish-english attention plot](img/295a20785cb201af0f19ee7414550082.png)
请注意:运行这个例子用一个 P100 GPU 需要花大约 10 分钟。
```py
import tensorflow as tf
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
from sklearn.model_selection import train_test_split
import unicodedata
import re
import numpy as np
import os
import io
import time
```
## 下载和准备数据集
我们将使用 [http://www.manythings.org/anki/](http://www.manythings.org/anki/) 提供的一个语言数据集。这个数据集包含如下格式的语言翻译对:
```py
May I borrow this book? ¿Puedo tomar prestado este libro?
```
这个数据集中有很多种语言可供选择。我们将使用英语 - 西班牙语数据集。为方便使用,我们在谷歌云上提供了此数据集的一份副本。但是你也可以自己下载副本。下载完数据集后,我们将采取下列步骤准备数据:
1. 给每个句子添加一个 *开始* 和一个 *结束* 标记token
2. 删除特殊字符以清理句子。
3. 创建一个单词索引和一个反向单词索引(即一个从单词映射至 id 的词典和一个从 id 映射至单词的词典)。
4. 将每个句子填充pad到最大长度。
```py
# 下载文件
path_to_zip = tf.keras.utils.get_file(
'spa-eng.zip', origin='http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip',
extract=True)
path_to_file = os.path.dirname(path_to_zip)+"/spa-eng/spa.txt"
```
```py
Downloading data from http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip
2646016/2638744 [==============================] - 0s 0us/step
```
```py
# 将 unicode 文件转换为 ascii
def unicode_to_ascii(s):
return ''.join(c for c in unicodedata.normalize('NFD', s)
if unicodedata.category(c) != 'Mn')
def preprocess_sentence(w):
w = unicode_to_ascii(w.lower().strip())
# 在单词与跟在其后的标点符号之间插入一个空格
# 例如: "he is a boy." => "he is a boy ."
# 参考https://stackoverflow.com/questions/3645931/python-padding-punctuation-with-white-spaces-keeping-punctuation
w = re.sub(r"([?.!,¿])", r" \1 ", w)
w = re.sub(r'[" "]+', " ", w)
# 除了 (a-z, A-Z, ".", "?", "!", ","),将所有字符替换为空格
w = re.sub(r"[^a-zA-Z?.!,¿]+", " ", w)
w = w.rstrip().strip()
# 给句子加上开始和结束标记
# 以便模型知道何时开始和结束预测
w = '<start> ' + w + ' <end>'
return w
```
```py
en_sentence = u"May I borrow this book?"
sp_sentence = u"¿Puedo tomar prestado este libro?"
print(preprocess_sentence(en_sentence))
print(preprocess_sentence(sp_sentence).encode('utf-8'))
```
```py
<start> may i borrow this book ? <end>
b'<start> \xc2\xbf puedo tomar prestado este libro ? <end>'
```
```py
# 1\. 去除重音符号
# 2\. 清理句子
# 3\. 返回这样格式的单词对:[ENGLISH, SPANISH]
def create_dataset(path, num_examples):
lines = io.open(path, encoding='UTF-8').read().strip().split('\n')
word_pairs = [[preprocess_sentence(w) for w in l.split('\t')] for l in lines[:num_examples]]
return zip(*word_pairs)
```
```py
en, sp = create_dataset(path_to_file, None)
print(en[-1])
print(sp[-1])
```
```py
<start> if you want to sound like a native speaker , you must be willing to practice saying the same sentence over and over in the same way that banjo players practice the same phrase over and over until they can play it correctly and at the desired tempo . <end>
<start> si quieres sonar como un hablante nativo , debes estar dispuesto a practicar diciendo la misma frase una y otra vez de la misma manera en que un musico de banjo practica el mismo fraseo una y otra vez hasta que lo puedan tocar correctamente y en el tiempo esperado . <end>
```
```py
def max_length(tensor):
return max(len(t) for t in tensor)
```
```py
def tokenize(lang):
lang_tokenizer = tf.keras.preprocessing.text.Tokenizer(
filters='')
lang_tokenizer.fit_on_texts(lang)
tensor = lang_tokenizer.texts_to_sequences(lang)
tensor = tf.keras.preprocessing.sequence.pad_sequences(tensor,
padding='post')
return tensor, lang_tokenizer
```
```py
def load_dataset(path, num_examples=None):
# 创建清理过的输入输出对
targ_lang, inp_lang = create_dataset(path, num_examples)
input_tensor, inp_lang_tokenizer = tokenize(inp_lang)
target_tensor, targ_lang_tokenizer = tokenize(targ_lang)
return input_tensor, target_tensor, inp_lang_tokenizer, targ_lang_tokenizer
```
### 限制数据集的大小以加快实验速度(可选)
在超过 10 万个句子的完整数据集上训练需要很长时间。为了更快地训练,我们可以将数据集的大小限制为 3 万个句子(当然,翻译质量也会随着数据的减少而降低):
```py
# 尝试实验不同大小的数据集
num_examples = 30000
input_tensor, target_tensor, inp_lang, targ_lang = load_dataset(path_to_file, num_examples)
# 计算目标张量的最大长度 max_length
max_length_targ, max_length_inp = max_length(target_tensor), max_length(input_tensor)
```
```py
# 采用 80 - 20 的比例切分训练集和验证集
input_tensor_train, input_tensor_val, target_tensor_train, target_tensor_val = train_test_split(input_tensor, target_tensor, test_size=0.2)
# 显示长度
print(len(input_tensor_train), len(target_tensor_train), len(input_tensor_val), len(target_tensor_val))
```
```py
24000 24000 6000 6000
```
```py
def convert(lang, tensor):
for t in tensor:
if t!=0:
print ("%d ----> %s" % (t, lang.index_word[t]))
```
```py
print ("Input Language; index to word mapping")
convert(inp_lang, input_tensor_train[0])
print ()
print ("Target Language; index to word mapping")
convert(targ_lang, target_tensor_train[0])
```
```py
Input Language; index to word mapping
1 ----> <start>
13 ----> la
1999 ----> belleza
7 ----> es
8096 ----> subjetiva
3 ----> .
2 ----> <end>
Target Language; index to word mapping
1 ----> <start>
1148 ----> beauty
8 ----> is
4299 ----> subjective
3 ----> .
2 ----> <end>
```
### 创建一个 tf.data 数据集
```py
BUFFER_SIZE = len(input_tensor_train)
BATCH_SIZE = 64
steps_per_epoch = len(input_tensor_train)//BATCH_SIZE
embedding_dim = 256
units = 1024
vocab_inp_size = len(inp_lang.word_index)+1
vocab_tar_size = len(targ_lang.word_index)+1
dataset = tf.data.Dataset.from_tensor_slices((input_tensor_train, target_tensor_train)).shuffle(BUFFER_SIZE)
dataset = dataset.batch(BATCH_SIZE, drop_remainder=True)
```
```py
example_input_batch, example_target_batch = next(iter(dataset))
example_input_batch.shape, example_target_batch.shape
```
```py
(TensorShape([64, 16]), TensorShape([64, 11]))
```
## 编写编码器 encoder 和解码器 decoder 模型
实现一个基于注意力的编码器 - 解码器模型。关于这种模型,你可以阅读 TensorFlow 的 [神经机器翻译 (序列到序列) 教程](https://github.com/tensorflow/nmt)。本示例采用一组更新的 API。此笔记本实现了上述序列到序列教程中的 [注意力方程式](https://github.com/tensorflow/nmt#background-on-the-attention-mechanism)。下图显示了注意力机制为每个输入单词分配一个权重,然后解码器将这个权重用于预测句子中的下一个单词。下图和公式是 [Luong 的论文](https://arxiv.org/abs/1508.04025v5)中注意力机制的一个例子。
![attention mechanism](img/b8397a070205f9293fbc989d8421eec5.png)
输入经过编码器模型,编码器模型为我们提供形状为 *(批大小,最大长度,隐藏层大小)* 的编码器输出和形状为 *(批大小,隐藏层大小)* 的编码器隐藏层状态。
下面是所实现的方程式:
![attention equation 0](img/20fad379e19d0355132a97db41137f4b.png) ![attention equation 1](img/9c9248a99f6346e02b6be5c21e5ab7be.png)
本教程的编码器采用 [Bahdanau 注意力](https://arxiv.org/pdf/1409.0473.pdf)。在用简化形式编写之前,让我们先决定符号:
* FC = 完全连接(密集)层
* EO = 编码器输出
* H = 隐藏层状态
* X = 解码器输入
以及伪代码:
* `score = FC(tanh(FC(EO) + FC(H)))`
* `attention weights = softmax(score, axis = 1)`。 Softmax 默认被应用于最后一个轴,但是这里我们想将它应用于 *第一个轴*, 因为分数 score 的形状是 *(批大小,最大长度,隐藏层大小)*。最大长度 `max_length` 是我们的输入的长度。因为我们想为每个输入分配一个权重,所以 softmax 应该用在这个轴上。
* `context vector = sum(attention weights * EO, axis = 1)`。选择第一个轴的原因同上。
* `embedding output` = 解码器输入 X 通过一个嵌入层。
* `merged vector = concat(embedding output, context vector)`
* 此合并后的向量随后被传送到 GRU
每个步骤中所有向量的形状已在代码的注释中阐明:
```py
class Encoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
super(Encoder, self).__init__()
self.batch_sz = batch_sz
self.enc_units = enc_units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
self.gru = tf.keras.layers.GRU(self.enc_units,
return_sequences=True,
return_state=True,
recurrent_initializer='glorot_uniform')
def call(self, x, hidden):
x = self.embedding(x)
output, state = self.gru(x, initial_state = hidden)
return output, state
def initialize_hidden_state(self):
return tf.zeros((self.batch_sz, self.enc_units))
```
```py
encoder = Encoder(vocab_inp_size, embedding_dim, units, BATCH_SIZE)
# 样本输入
sample_hidden = encoder.initialize_hidden_state()
sample_output, sample_hidden = encoder(example_input_batch, sample_hidden)
print ('Encoder output shape: (batch size, sequence length, units) {}'.format(sample_output.shape))
print ('Encoder Hidden state shape: (batch size, units) {}'.format(sample_hidden.shape))
```
```py
Encoder output shape: (batch size, sequence length, units) (64, 16, 1024)
Encoder Hidden state shape: (batch size, units) (64, 1024)
```
```py
class BahdanauAttention(tf.keras.layers.Layer):
def __init__(self, units):
super(BahdanauAttention, self).__init__()
self.W1 = tf.keras.layers.Dense(units)
self.W2 = tf.keras.layers.Dense(units)
self.V = tf.keras.layers.Dense(1)
def call(self, query, values):
# 隐藏层的形状 == (批大小,隐藏层大小)
# hidden_with_time_axis 的形状 == 批大小1隐藏层大小
# 这样做是为了执行加法以计算分数
hidden_with_time_axis = tf.expand_dims(query, 1)
# 分数的形状 == 批大小最大长度1
# 我们在最后一个轴上得到 1 因为我们把分数应用于 self.V
# 在应用 self.V 之前,张量的形状是(批大小,最大长度,单位)
score = self.V(tf.nn.tanh(
self.W1(values) + self.W2(hidden_with_time_axis)))
# 注意力权重 attention_weights 的形状 == 批大小最大长度1
attention_weights = tf.nn.softmax(score, axis=1)
# 上下文向量 context_vector 求和之后的形状 == (批大小,隐藏层大小)
context_vector = attention_weights * values
context_vector = tf.reduce_sum(context_vector, axis=1)
return context_vector, attention_weights
```
```py
attention_layer = BahdanauAttention(10)
attention_result, attention_weights = attention_layer(sample_hidden, sample_output)
print("Attention result shape: (batch size, units) {}".format(attention_result.shape))
print("Attention weights shape: (batch_size, sequence_length, 1) {}".format(attention_weights.shape))
```
```py
Attention result shape: (batch size, units) (64, 1024)
Attention weights shape: (batch_size, sequence_length, 1) (64, 16, 1)
```
```py
class Decoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
super(Decoder, self).__init__()
self.batch_sz = batch_sz
self.dec_units = dec_units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
self.gru = tf.keras.layers.GRU(self.dec_units,
return_sequences=True,
return_state=True,
recurrent_initializer='glorot_uniform')
self.fc = tf.keras.layers.Dense(vocab_size)
# 用于注意力
self.attention = BahdanauAttention(self.dec_units)
def call(self, x, hidden, enc_output):
# 编码器输出 enc_output 的形状 == (批大小,最大长度,隐藏层大小)
context_vector, attention_weights = self.attention(hidden, enc_output)
# x 在通过嵌入层后的形状 == 批大小1嵌入维度
x = self.embedding(x)
# x 在拼接 concatenation 后的形状 == 批大小1嵌入维度 + 隐藏层大小)
x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
# 将合并后的向量传送到 GRU
output, state = self.gru(x)
# 输出的形状 == (批大小 * 1隐藏层大小
output = tf.reshape(output, (-1, output.shape[2]))
# 输出的形状 == 批大小vocab
x = self.fc(output)
return x, state, attention_weights
```
```py
decoder = Decoder(vocab_tar_size, embedding_dim, units, BATCH_SIZE)
sample_decoder_output, _, _ = decoder(tf.random.uniform((64, 1)),
sample_hidden, sample_output)
print ('Decoder output shape: (batch_size, vocab size) {}'.format(sample_decoder_output.shape))
```
```py
Decoder output shape: (batch_size, vocab size) (64, 4935)
```
## 定义优化器和损失函数
```py
optimizer = tf.keras.optimizers.Adam()
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(
from_logits=True, reduction='none')
def loss_function(real, pred):
mask = tf.math.logical_not(tf.math.equal(real, 0))
loss_ = loss_object(real, pred)
mask = tf.cast(mask, dtype=loss_.dtype)
loss_ *= mask
return tf.reduce_mean(loss_)
```
## 检查点(基于对象保存)
```py
checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
checkpoint = tf.train.Checkpoint(optimizer=optimizer,
encoder=encoder,
decoder=decoder)
```
## 训练
1.*输入* 传送至 *编码器*,编码器返回 *编码器输出**编码器隐藏层状态*
2. 将编码器输出、编码器隐藏层状态和解码器输入(即 *开始标记*)传送至解码器。
3. 解码器返回 *预测**解码器隐藏层状态*
4. 解码器隐藏层状态被传送回模型,预测被用于计算损失。
5. 使用 *教师强制 teacher forcing* 决定解码器的下一个输入。
6. *教师强制* 是将 *目标词* 作为 *下一个输入* 传送至解码器的技术。
7. 最后一步是计算梯度,并将其应用于优化器和反向传播。
```py
@tf.function
def train_step(inp, targ, enc_hidden):
loss = 0
with tf.GradientTape() as tape:
enc_output, enc_hidden = encoder(inp, enc_hidden)
dec_hidden = enc_hidden
dec_input = tf.expand_dims([targ_lang.word_index['<start>']] * BATCH_SIZE, 1)
# 教师强制 - 将目标词作为下一个输入
for t in range(1, targ.shape[1]):
# 将编码器输出 enc_output 传送至解码器
predictions, dec_hidden, _ = decoder(dec_input, dec_hidden, enc_output)
loss += loss_function(targ[:, t], predictions)
# 使用教师强制
dec_input = tf.expand_dims(targ[:, t], 1)
batch_loss = (loss / int(targ.shape[1]))
variables = encoder.trainable_variables + decoder.trainable_variables
gradients = tape.gradient(loss, variables)
optimizer.apply_gradients(zip(gradients, variables))
return batch_loss
```
```py
EPOCHS = 10
for epoch in range(EPOCHS):
start = time.time()
enc_hidden = encoder.initialize_hidden_state()
total_loss = 0
for (batch, (inp, targ)) in enumerate(dataset.take(steps_per_epoch)):
batch_loss = train_step(inp, targ, enc_hidden)
total_loss += batch_loss
if batch % 100 == 0:
print('Epoch {} Batch {} Loss {:.4f}'.format(epoch + 1,
batch,
batch_loss.numpy()))
# 每 2 个周期epoch保存检查点一次模型
if (epoch + 1) % 2 == 0:
checkpoint.save(file_prefix = checkpoint_prefix)
print('Epoch {} Loss {:.4f}'.format(epoch + 1,
total_loss / steps_per_epoch))
print('Time taken for 1 epoch {} sec\n'.format(time.time() - start))
```
```py
Epoch 1 Batch 0 Loss 4.6508
Epoch 1 Batch 100 Loss 2.1923
Epoch 1 Batch 200 Loss 1.7957
Epoch 1 Batch 300 Loss 1.7889
Epoch 1 Loss 2.0564
Time taken for 1 epoch 28.358328819274902 sec
Epoch 2 Batch 0 Loss 1.5558
Epoch 2 Batch 100 Loss 1.5256
Epoch 2 Batch 200 Loss 1.4604
Epoch 2 Batch 300 Loss 1.3006
Epoch 2 Loss 1.4770
Time taken for 1 epoch 16.062172651290894 sec
Epoch 3 Batch 0 Loss 1.1928
Epoch 3 Batch 100 Loss 1.1909
Epoch 3 Batch 200 Loss 1.0559
Epoch 3 Batch 300 Loss 0.9279
Epoch 3 Loss 1.1305
Time taken for 1 epoch 15.620810270309448 sec
Epoch 4 Batch 0 Loss 0.8910
Epoch 4 Batch 100 Loss 0.7890
Epoch 4 Batch 200 Loss 0.8234
Epoch 4 Batch 300 Loss 0.8448
Epoch 4 Loss 0.8080
Time taken for 1 epoch 15.983836889266968 sec
Epoch 5 Batch 0 Loss 0.4728
Epoch 5 Batch 100 Loss 0.7090
Epoch 5 Batch 200 Loss 0.6280
Epoch 5 Batch 300 Loss 0.5421
Epoch 5 Loss 0.5710
Time taken for 1 epoch 15.588238716125488 sec
Epoch 6 Batch 0 Loss 0.4209
Epoch 6 Batch 100 Loss 0.3995
Epoch 6 Batch 200 Loss 0.4426
Epoch 6 Batch 300 Loss 0.4470
Epoch 6 Loss 0.4063
Time taken for 1 epoch 15.882423639297485 sec
Epoch 7 Batch 0 Loss 0.2503
Epoch 7 Batch 100 Loss 0.3373
Epoch 7 Batch 200 Loss 0.3342
Epoch 7 Batch 300 Loss 0.2955
Epoch 7 Loss 0.2938
Time taken for 1 epoch 15.601640939712524 sec
Epoch 8 Batch 0 Loss 0.1662
Epoch 8 Batch 100 Loss 0.1923
Epoch 8 Batch 200 Loss 0.2131
Epoch 8 Batch 300 Loss 0.2464
Epoch 8 Loss 0.2175
Time taken for 1 epoch 15.917790412902832 sec
Epoch 9 Batch 0 Loss 0.1450
Epoch 9 Batch 100 Loss 0.1351
Epoch 9 Batch 200 Loss 0.2102
Epoch 9 Batch 300 Loss 0.2188
Epoch 9 Loss 0.1659
Time taken for 1 epoch 15.727098941802979 sec
Epoch 10 Batch 0 Loss 0.0995
Epoch 10 Batch 100 Loss 0.1190
Epoch 10 Batch 200 Loss 0.1444
Epoch 10 Batch 300 Loss 0.1280
Epoch 10 Loss 0.1294
Time taken for 1 epoch 15.857161045074463 sec
```
## 翻译
* 评估函数类似于训练循环,不同之处在于在这里我们不使用 *教师强制*。每个时间步的解码器输入是其先前的预测、隐藏层状态和编码器输出。
* 当模型预测 *结束标记* 时停止预测。
* 存储 *每个时间步的注意力权重*
请注意:对于一个输入,编码器输出仅计算一次。
```py
def evaluate(sentence):
attention_plot = np.zeros((max_length_targ, max_length_inp))
sentence = preprocess_sentence(sentence)
inputs = [inp_lang.word_index[i] for i in sentence.split(' ')]
inputs = tf.keras.preprocessing.sequence.pad_sequences([inputs],
maxlen=max_length_inp,
padding='post')
inputs = tf.convert_to_tensor(inputs)
result = ''
hidden = [tf.zeros((1, units))]
enc_out, enc_hidden = encoder(inputs, hidden)
dec_hidden = enc_hidden
dec_input = tf.expand_dims([targ_lang.word_index['<start>']], 0)
for t in range(max_length_targ):
predictions, dec_hidden, attention_weights = decoder(dec_input,
dec_hidden,
enc_out)
# 存储注意力权重以便后面制图
attention_weights = tf.reshape(attention_weights, (-1, ))
attention_plot[t] = attention_weights.numpy()
predicted_id = tf.argmax(predictions[0]).numpy()
result += targ_lang.index_word[predicted_id] + ' '
if targ_lang.index_word[predicted_id] == '<end>':
return result, sentence, attention_plot
# 预测的 ID 被输送回模型
dec_input = tf.expand_dims([predicted_id], 0)
return result, sentence, attention_plot
```
```py
# 注意力权重制图函数
def plot_attention(attention, sentence, predicted_sentence):
fig = plt.figure(figsize=(10,10))
ax = fig.add_subplot(1, 1, 1)
ax.matshow(attention, cmap='viridis')
fontdict = {'fontsize': 14}
ax.set_xticklabels([''] + sentence, fontdict=fontdict, rotation=90)
ax.set_yticklabels([''] + predicted_sentence, fontdict=fontdict)
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
ax.yaxis.set_major_locator(ticker.MultipleLocator(1))
plt.show()
```
```py
def translate(sentence):
result, sentence, attention_plot = evaluate(sentence)
print('Input: %s' % (sentence))
print('Predicted translation: {}'.format(result))
attention_plot = attention_plot[:len(result.split(' ')), :len(sentence.split(' '))]
plot_attention(attention_plot, sentence.split(' '), result.split(' '))
```
## 恢复最新的检查点并验证
```py
# 恢复检查点目录 checkpoint_dir 中最新的检查点
checkpoint.restore(tf.train.latest_checkpoint(checkpoint_dir))
```
```py
<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7f3d31e73f98>
```
```py
translate(u'hace mucho frio aqui.')
```
```py
Input: <start> hace mucho frio aqui . <end>
Predicted translation: it s very cold here . <end>
```
![png](img/86f4e22b402c9e48d76da7068ace2175.png)
```py
translate(u'esta es mi vida.')
```
```py
Input: <start> esta es mi vida . <end>
Predicted translation: this is my life . <end>
```
![png](img/5ae7b3b0f94a71db86b4168d116179ff.png)
```py
translate(u'¿todavia estan en casa?')
```
```py
Input: <start> ¿ todavia estan en casa ? <end>
Predicted translation: are you still at home ? <end>
```
![png](img/3e8e9f9ba0ac0f802575b228ffa360c0.png)
```py
# 错误的翻译
translate(u'trata de averiguarlo.')
```
```py
Input: <start> trata de averiguarlo . <end>
Predicted translation: try to be coming . <end>
```
![png](img/996d41e44b9998dc439ec88b9b370cec.png)
## 下一步
* [下载一个不同的数据集](http://www.manythings.org/anki/)实验翻译,例如英语到德语或者英语到法语。
* 实验在更大的数据集上训练,或者增加训练周期。

View File

@@ -1,819 +0,0 @@
# Image captioning with visual attention
> 原文:[https://tensorflow.google.cn/tutorials/text/image_captioning](https://tensorflow.google.cn/tutorials/text/image_captioning)
<devsite-mathjax config="TeX-AMS-MML_SVG"></devsite-mathjax>
Given an image like the example below, our goal is to generate a caption such as "a surfer riding on a wave".
![Man Surfing](img/72fcb6a7bcc602106e2c60268d3642c5.png)
*[Image Source](https://commons.wikimedia.org/wiki/Surfing#/media/File:Surfing_in_Hawaii.jpg); License: Public Domain*
To accomplish this, you'll use an attention-based model, which enables us to see what parts of the image the model focuses on as it generates a caption.
![Prediction](img/7534c154062dc8f522f01d83838f3161.png)
The model architecture is similar to [Show, Attend and Tell: Neural Image Caption Generation with Visual Attention](https://arxiv.org/abs/1502.03044).
This notebook is an end-to-end example. When you run the notebook, it downloads the [MS-COCO](http://cocodataset.org/#home) dataset, preprocesses and caches a subset of images using Inception V3, trains an encoder-decoder model, and generates captions on new images using the trained model.
In this example, you will train a model on a relatively small amount of data—the first 30,000 captions for about 20,000 images (because there are multiple captions per image in the dataset).
```py
import tensorflow as tf
# You'll generate plots of attention in order to see which parts of an image
# our model focuses on during captioning
import matplotlib.pyplot as plt
# Scikit-learn includes many helpful utilities
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
import collections
import random
import re
import numpy as np
import os
import time
import json
from glob import glob
from PIL import Image
import pickle
```
## Download and prepare the MS-COCO dataset
You will use the [MS-COCO dataset](http://cocodataset.org/#home) to train our model. The dataset contains over 82,000 images, each of which has at least 5 different caption annotations. The code below downloads and extracts the dataset automatically.
**Caution:** large download ahead**. You'll use the training set, which is a 13GB file.
```py
# Download caption annotation files
annotation_folder = '/annotations/'
if not os.path.exists(os.path.abspath('.') + annotation_folder):
annotation_zip = tf.keras.utils.get_file('captions.zip',
cache_subdir=os.path.abspath('.'),
origin = 'http://images.cocodataset.org/annotations/annotations_trainval2014.zip',
extract = True)
annotation_file = os.path.dirname(annotation_zip)+'/annotations/captions_train2014.json'
os.remove(annotation_zip)
# Download image files
image_folder = '/train2014/'
if not os.path.exists(os.path.abspath('.') + image_folder):
image_zip = tf.keras.utils.get_file('train2014.zip',
cache_subdir=os.path.abspath('.'),
origin = 'http://images.cocodataset.org/zips/train2014.zip',
extract = True)
PATH = os.path.dirname(image_zip) + image_folder
os.remove(image_zip)
else:
PATH = os.path.abspath('.') + image_folder
```
```py
Downloading data from http://images.cocodataset.org/annotations/annotations_trainval2014.zip
252878848/252872794 [==============================] - 7s 0us/step
Downloading data from http://images.cocodataset.org/zips/train2014.zip
13510574080/13510573713 [==============================] - 374s 0us/step
```
## Optional: limit the size of the training set
To speed up training for this tutorial, you'll use a subset of 30,000 captions and their corresponding images to train our model. Choosing to use more data would result in improved captioning quality.
```py
with open(annotation_file, 'r') as f:
annotations = json.load(f)
```
```py
# Group all captions together having the same image ID.
image_path_to_caption = collections.defaultdict(list)
for val in annotations['annotations']:
caption = f"<start> {val['caption']} <end>"
image_path = PATH + 'COCO_train2014_' + '%012d.jpg' % (val['image_id'])
image_path_to_caption[image_path].append(caption)
```
```py
image_paths = list(image_path_to_caption.keys())
random.shuffle(image_paths)
# Select the first 6000 image_paths from the shuffled set.
# Approximately each image id has 5 captions associated with it, so that will
# lead to 30,000 examples.
train_image_paths = image_paths[:6000]
print(len(train_image_paths))
```
```py
6000
```
```py
train_captions = []
img_name_vector = []
for image_path in train_image_paths:
caption_list = image_path_to_caption[image_path]
train_captions.extend(caption_list)
img_name_vector.extend([image_path] * len(caption_list))
```
```py
print(train_captions[0])
Image.open(img_name_vector[0])
```
```py
<start> a woman in a blue dress is playing tennis <end>
```
![png](img/77a9a1e4b542e966076c493155a71253.png)
## Preprocess the images using InceptionV3
Next, you will use InceptionV3 (which is pretrained on Imagenet) to classify each image. You will extract features from the last convolutional layer.
First, you will convert the images into InceptionV3's expected format by:
* Resizing the image to 299px by 299px
* [Preprocess the images](https://cloud.google.com/tpu/docs/inception-v3-advanced#preprocessing_stage) using the [preprocess_input](https://tensorflow.google.cn/api_docs/python/tf/keras/applications/inception_v3/preprocess_input) method to normalize the image so that it contains pixels in the range of -1 to 1, which matches the format of the images used to train InceptionV3.
```py
def load_image(image_path):
img = tf.io.read_file(image_path)
img = tf.image.decode_jpeg(img, channels=3)
img = tf.image.resize(img, (299, 299))
img = tf.keras.applications.inception_v3.preprocess_input(img)
return img, image_path
```
## Initialize InceptionV3 and load the pretrained Imagenet weights
Now you'll create a tf.keras model where the output layer is the last convolutional layer in the InceptionV3 architecture. The shape of the output of this layer is `8x8x2048`. You use the last convolutional layer because you are using attention in this example. You don't perform this initialization during training because it could become a bottleneck.
* You forward each image through the network and store the resulting vector in a dictionary (image_name --> feature_vector).
* After all the images are passed through the network, you pickle the dictionary and save it to disk.
```py
image_model = tf.keras.applications.InceptionV3(include_top=False,
weights='imagenet')
new_input = image_model.input
hidden_layer = image_model.layers[-1].output
image_features_extract_model = tf.keras.Model(new_input, hidden_layer)
```
```py
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/inception_v3/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5
87916544/87910968 [==============================] - 1s 0us/step
```
## Caching the features extracted from InceptionV3
You will pre-process each image with InceptionV3 and cache the output to disk. Caching the output in RAM would be faster but also memory intensive, requiring 8 * 8 * 2048 floats per image. At the time of writing, this exceeds the memory limitations of Colab (currently 12GB of memory).
Performance could be improved with a more sophisticated caching strategy (for example, by sharding the images to reduce random access disk I/O), but that would require more code.
The caching will take about 10 minutes to run in Colab with a GPU. If you'd like to see a progress bar, you can:
1. install [tqdm](https://github.com/tqdm/tqdm):
`!pip install -q tqdm`
2. Import tqdm:
`from tqdm import tqdm`
3. Change the following line:
`for img, path in image_dataset:`
to:
`for img, path in tqdm(image_dataset):`
```py
# Get unique images
encode_train = sorted(set(img_name_vector))
# Feel free to change batch_size according to your system configuration
image_dataset = tf.data.Dataset.from_tensor_slices(encode_train)
image_dataset = image_dataset.map(
load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE).batch(16)
for img, path in image_dataset:
batch_features = image_features_extract_model(img)
batch_features = tf.reshape(batch_features,
(batch_features.shape[0], -1, batch_features.shape[3]))
for bf, p in zip(batch_features, path):
path_of_feature = p.numpy().decode("utf-8")
np.save(path_of_feature, bf.numpy())
```
## Preprocess and tokenize the captions
* First, you'll tokenize the captions (for example, by splitting on spaces). This gives us a vocabulary of all of the unique words in the data (for example, "surfing", "football", and so on).
* Next, you'll limit the vocabulary size to the top 5,000 words (to save memory). You'll replace all other words with the token "UNK" (unknown).
* You then create word-to-index and index-to-word mappings.
* Finally, you pad all sequences to be the same length as the longest one.
```py
# Find the maximum length of any caption in our dataset
def calc_max_length(tensor):
return max(len(t) for t in tensor)
```
```py
# Choose the top 5000 words from the vocabulary
top_k = 5000
tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=top_k,
oov_token="<unk>",
filters='!"#$%&()*+.,-/:;=?@[\]^_`{|}~ ')
tokenizer.fit_on_texts(train_captions)
train_seqs = tokenizer.texts_to_sequences(train_captions)
```
```py
tokenizer.word_index['<pad>'] = 0
tokenizer.index_word[0] = '<pad>'
```
```py
# Create the tokenized vectors
train_seqs = tokenizer.texts_to_sequences(train_captions)
```
```py
# Pad each vector to the max_length of the captions
# If you do not provide a max_length value, pad_sequences calculates it automatically
cap_vector = tf.keras.preprocessing.sequence.pad_sequences(train_seqs, padding='post')
```
```py
# Calculates the max_length, which is used to store the attention weights
max_length = calc_max_length(train_seqs)
```
## Split the data into training and testing
```py
img_to_cap_vector = collections.defaultdict(list)
for img, cap in zip(img_name_vector, cap_vector):
img_to_cap_vector[img].append(cap)
# Create training and validation sets using an 80-20 split randomly.
img_keys = list(img_to_cap_vector.keys())
random.shuffle(img_keys)
slice_index = int(len(img_keys)*0.8)
img_name_train_keys, img_name_val_keys = img_keys[:slice_index], img_keys[slice_index:]
img_name_train = []
cap_train = []
for imgt in img_name_train_keys:
capt_len = len(img_to_cap_vector[imgt])
img_name_train.extend([imgt] * capt_len)
cap_train.extend(img_to_cap_vector[imgt])
img_name_val = []
cap_val = []
for imgv in img_name_val_keys:
capv_len = len(img_to_cap_vector[imgv])
img_name_val.extend([imgv] * capv_len)
cap_val.extend(img_to_cap_vector[imgv])
```
```py
len(img_name_train), len(cap_train), len(img_name_val), len(cap_val)
```
```py
(24009, 24009, 6001, 6001)
```
## Create a tf.data dataset for training
Our images and captions are ready! Next, let's create a tf.data dataset to use for training our model.
```py
# Feel free to change these parameters according to your system's configuration
BATCH_SIZE = 64
BUFFER_SIZE = 1000
embedding_dim = 256
units = 512
vocab_size = top_k + 1
num_steps = len(img_name_train) // BATCH_SIZE
# Shape of the vector extracted from InceptionV3 is (64, 2048)
# These two variables represent that vector shape
features_shape = 2048
attention_features_shape = 64
```
```py
# Load the numpy files
def map_func(img_name, cap):
img_tensor = np.load(img_name.decode('utf-8')+'.npy')
return img_tensor, cap
```
```py
dataset = tf.data.Dataset.from_tensor_slices((img_name_train, cap_train))
# Use map to load the numpy files in parallel
dataset = dataset.map(lambda item1, item2: tf.numpy_function(
map_func, [item1, item2], [tf.float32, tf.int32]),
num_parallel_calls=tf.data.experimental.AUTOTUNE)
# Shuffle and batch
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
```
## Model
Fun fact: the decoder below is identical to the one in the example for [Neural Machine Translation with Attention](https://tensorflow.google.cn/tutorials/sequences/nmt_with_attention).
The model architecture is inspired by the [Show, Attend and Tell](https://arxiv.org/pdf/1502.03044.pdf) paper.
* In this example, you extract the features from the lower convolutional layer of InceptionV3 giving us a vector of shape (8, 8, 2048).
* You squash that to a shape of (64, 2048).
* This vector is then passed through the CNN Encoder (which consists of a single Fully connected layer).
* The RNN (here GRU) attends over the image to predict the next word.
```py
class BahdanauAttention(tf.keras.Model):
def __init__(self, units):
super(BahdanauAttention, self).__init__()
self.W1 = tf.keras.layers.Dense(units)
self.W2 = tf.keras.layers.Dense(units)
self.V = tf.keras.layers.Dense(1)
def call(self, features, hidden):
# features(CNN_encoder output) shape == (batch_size, 64, embedding_dim)
# hidden shape == (batch_size, hidden_size)
# hidden_with_time_axis shape == (batch_size, 1, hidden_size)
hidden_with_time_axis = tf.expand_dims(hidden, 1)
# attention_hidden_layer shape == (batch_size, 64, units)
attention_hidden_layer = (tf.nn.tanh(self.W1(features) +
self.W2(hidden_with_time_axis)))
# score shape == (batch_size, 64, 1)
# This gives you an unnormalized score for each image feature.
score = self.V(attention_hidden_layer)
# attention_weights shape == (batch_size, 64, 1)
attention_weights = tf.nn.softmax(score, axis=1)
# context_vector shape after sum == (batch_size, hidden_size)
context_vector = attention_weights * features
context_vector = tf.reduce_sum(context_vector, axis=1)
return context_vector, attention_weights
```
```py
class CNN_Encoder(tf.keras.Model):
# Since you have already extracted the features and dumped it using pickle
# This encoder passes those features through a Fully connected layer
def __init__(self, embedding_dim):
super(CNN_Encoder, self).__init__()
# shape after fc == (batch_size, 64, embedding_dim)
self.fc = tf.keras.layers.Dense(embedding_dim)
def call(self, x):
x = self.fc(x)
x = tf.nn.relu(x)
return x
```
```py
class RNN_Decoder(tf.keras.Model):
def __init__(self, embedding_dim, units, vocab_size):
super(RNN_Decoder, self).__init__()
self.units = units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
self.gru = tf.keras.layers.GRU(self.units,
return_sequences=True,
return_state=True,
recurrent_initializer='glorot_uniform')
self.fc1 = tf.keras.layers.Dense(self.units)
self.fc2 = tf.keras.layers.Dense(vocab_size)
self.attention = BahdanauAttention(self.units)
def call(self, x, features, hidden):
# defining attention as a separate model
context_vector, attention_weights = self.attention(features, hidden)
# x shape after passing through embedding == (batch_size, 1, embedding_dim)
x = self.embedding(x)
# x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
# passing the concatenated vector to the GRU
output, state = self.gru(x)
# shape == (batch_size, max_length, hidden_size)
x = self.fc1(output)
# x shape == (batch_size * max_length, hidden_size)
x = tf.reshape(x, (-1, x.shape[2]))
# output shape == (batch_size * max_length, vocab)
x = self.fc2(x)
return x, state, attention_weights
def reset_state(self, batch_size):
return tf.zeros((batch_size, self.units))
```
```py
encoder = CNN_Encoder(embedding_dim)
decoder = RNN_Decoder(embedding_dim, units, vocab_size)
```
```py
optimizer = tf.keras.optimizers.Adam()
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(
from_logits=True, reduction='none')
def loss_function(real, pred):
mask = tf.math.logical_not(tf.math.equal(real, 0))
loss_ = loss_object(real, pred)
mask = tf.cast(mask, dtype=loss_.dtype)
loss_ *= mask
return tf.reduce_mean(loss_)
```
## Checkpoint
```py
checkpoint_path = "./checkpoints/train"
ckpt = tf.train.Checkpoint(encoder=encoder,
decoder=decoder,
optimizer = optimizer)
ckpt_manager = tf.train.CheckpointManager(ckpt, checkpoint_path, max_to_keep=5)
```
```py
start_epoch = 0
if ckpt_manager.latest_checkpoint:
start_epoch = int(ckpt_manager.latest_checkpoint.split('-')[-1])
# restoring the latest checkpoint in checkpoint_path
ckpt.restore(ckpt_manager.latest_checkpoint)
```
## Training
* You extract the features stored in the respective `.npy` files and then pass those features through the encoder.
* The encoder output, hidden state(initialized to 0) and the decoder input (which is the start token) is passed to the decoder.
* The decoder returns the predictions and the decoder hidden state.
* The decoder hidden state is then passed back into the model and the predictions are used to calculate the loss.
* Use teacher forcing to decide the next input to the decoder.
* Teacher forcing is the technique where the target word is passed as the next input to the decoder.
* The final step is to calculate the gradients and apply it to the optimizer and backpropagate.
```py
# adding this in a separate cell because if you run the training cell
# many times, the loss_plot array will be reset
loss_plot = []
```
```py
@tf.function
def train_step(img_tensor, target):
loss = 0
# initializing the hidden state for each batch
# because the captions are not related from image to image
hidden = decoder.reset_state(batch_size=target.shape[0])
dec_input = tf.expand_dims([tokenizer.word_index['<start>']] * target.shape[0], 1)
with tf.GradientTape() as tape:
features = encoder(img_tensor)
for i in range(1, target.shape[1]):
# passing the features through the decoder
predictions, hidden, _ = decoder(dec_input, features, hidden)
loss += loss_function(target[:, i], predictions)
# using teacher forcing
dec_input = tf.expand_dims(target[:, i], 1)
total_loss = (loss / int(target.shape[1]))
trainable_variables = encoder.trainable_variables + decoder.trainable_variables
gradients = tape.gradient(loss, trainable_variables)
optimizer.apply_gradients(zip(gradients, trainable_variables))
return loss, total_loss
```
```py
EPOCHS = 20
for epoch in range(start_epoch, EPOCHS):
start = time.time()
total_loss = 0
for (batch, (img_tensor, target)) in enumerate(dataset):
batch_loss, t_loss = train_step(img_tensor, target)
total_loss += t_loss
if batch % 100 == 0:
print ('Epoch {} Batch {} Loss {:.4f}'.format(
epoch + 1, batch, batch_loss.numpy() / int(target.shape[1])))
# storing the epoch end loss value to plot later
loss_plot.append(total_loss / num_steps)
if epoch % 5 == 0:
ckpt_manager.save()
print ('Epoch {} Loss {:.6f}'.format(epoch + 1,
total_loss/num_steps))
print ('Time taken for 1 epoch {} sec\n'.format(time.time() - start))
```
```py
Epoch 1 Batch 0 Loss 2.0618
Epoch 1 Batch 100 Loss 1.1516
Epoch 1 Batch 200 Loss 0.9201
Epoch 1 Batch 300 Loss 0.8922
Epoch 1 Loss 1.040854
Time taken for 1 epoch 100.07987594604492 sec
Epoch 2 Batch 0 Loss 0.8678
Epoch 2 Batch 100 Loss 0.8257
Epoch 2 Batch 200 Loss 0.8268
Epoch 2 Batch 300 Loss 0.7109
Epoch 2 Loss 0.786627
Time taken for 1 epoch 36.52699089050293 sec
Epoch 3 Batch 0 Loss 0.7747
Epoch 3 Batch 100 Loss 0.7220
Epoch 3 Batch 200 Loss 0.7071
Epoch 3 Batch 300 Loss 0.7065
Epoch 3 Loss 0.708941
Time taken for 1 epoch 36.67209577560425 sec
Epoch 4 Batch 0 Loss 0.7542
Epoch 4 Batch 100 Loss 0.6422
Epoch 4 Batch 200 Loss 0.6024
Epoch 4 Batch 300 Loss 0.7107
Epoch 4 Loss 0.657265
Time taken for 1 epoch 36.70520520210266 sec
Epoch 5 Batch 0 Loss 0.6684
Epoch 5 Batch 100 Loss 0.6549
Epoch 5 Batch 200 Loss 0.6364
Epoch 5 Batch 300 Loss 0.6250
Epoch 5 Loss 0.616459
Time taken for 1 epoch 36.51219129562378 sec
Epoch 6 Batch 0 Loss 0.6531
Epoch 6 Batch 100 Loss 0.5622
Epoch 6 Batch 200 Loss 0.5688
Epoch 6 Batch 300 Loss 0.6302
Epoch 6 Loss 0.581336
Time taken for 1 epoch 37.36966156959534 sec
Epoch 7 Batch 0 Loss 0.5335
Epoch 7 Batch 100 Loss 0.5362
Epoch 7 Batch 200 Loss 0.5960
Epoch 7 Batch 300 Loss 0.5382
Epoch 7 Loss 0.558110
Time taken for 1 epoch 36.8504319190979 sec
Epoch 8 Batch 0 Loss 0.5242
Epoch 8 Batch 100 Loss 0.5142
Epoch 8 Batch 200 Loss 0.5458
Epoch 8 Batch 300 Loss 0.4814
Epoch 8 Loss 0.523847
Time taken for 1 epoch 36.90491819381714 sec
Epoch 9 Batch 0 Loss 0.5318
Epoch 9 Batch 100 Loss 0.4869
Epoch 9 Batch 200 Loss 0.4791
Epoch 9 Batch 300 Loss 0.4719
Epoch 9 Loss 0.496363
Time taken for 1 epoch 36.52782845497131 sec
Epoch 10 Batch 0 Loss 0.4707
Epoch 10 Batch 100 Loss 0.4642
Epoch 10 Batch 200 Loss 0.4685
Epoch 10 Batch 300 Loss 0.4659
Epoch 10 Loss 0.470341
Time taken for 1 epoch 36.24022054672241 sec
Epoch 11 Batch 0 Loss 0.4530
Epoch 11 Batch 100 Loss 0.4947
Epoch 11 Batch 200 Loss 0.4457
Epoch 11 Batch 300 Loss 0.4617
Epoch 11 Loss 0.447154
Time taken for 1 epoch 36.481024980545044 sec
Epoch 12 Batch 0 Loss 0.4359
Epoch 12 Batch 100 Loss 0.4257
Epoch 12 Batch 200 Loss 0.4124
Epoch 12 Batch 300 Loss 0.4302
Epoch 12 Loss 0.424052
Time taken for 1 epoch 37.11701226234436 sec
Epoch 13 Batch 0 Loss 0.4531
Epoch 13 Batch 100 Loss 0.4064
Epoch 13 Batch 200 Loss 0.3677
Epoch 13 Batch 300 Loss 0.3942
Epoch 13 Loss 0.402709
Time taken for 1 epoch 36.868356466293335 sec
Epoch 14 Batch 0 Loss 0.3967
Epoch 14 Batch 100 Loss 0.3455
Epoch 14 Batch 200 Loss 0.3742
Epoch 14 Batch 300 Loss 0.3905
Epoch 14 Loss 0.382572
Time taken for 1 epoch 36.95557117462158 sec
Epoch 15 Batch 0 Loss 0.3754
Epoch 15 Batch 100 Loss 0.3721
Epoch 15 Batch 200 Loss 0.3633
Epoch 15 Batch 300 Loss 0.3830
Epoch 15 Loss 0.364831
Time taken for 1 epoch 36.37884545326233 sec
Epoch 16 Batch 0 Loss 0.3873
Epoch 16 Batch 100 Loss 0.3499
Epoch 16 Batch 200 Loss 0.3437
Epoch 16 Batch 300 Loss 0.3232
Epoch 16 Loss 0.346227
Time taken for 1 epoch 36.44292426109314 sec
Epoch 17 Batch 0 Loss 0.3250
Epoch 17 Batch 100 Loss 0.3218
Epoch 17 Batch 200 Loss 0.3703
Epoch 17 Batch 300 Loss 0.2849
Epoch 17 Loss 0.328413
Time taken for 1 epoch 36.11301136016846 sec
Epoch 18 Batch 0 Loss 0.3032
Epoch 18 Batch 100 Loss 0.3321
Epoch 18 Batch 200 Loss 0.3112
Epoch 18 Batch 300 Loss 0.3129
Epoch 18 Loss 0.315071
Time taken for 1 epoch 36.2520546913147 sec
Epoch 19 Batch 0 Loss 0.3005
Epoch 19 Batch 100 Loss 0.3190
Epoch 19 Batch 200 Loss 0.3243
Epoch 19 Batch 300 Loss 0.2861
Epoch 19 Loss 0.301502
Time taken for 1 epoch 36.188610553741455 sec
Epoch 20 Batch 0 Loss 0.3263
Epoch 20 Batch 100 Loss 0.3182
Epoch 20 Batch 200 Loss 0.2885
Epoch 20 Batch 300 Loss 0.2923
Epoch 20 Loss 0.285932
Time taken for 1 epoch 36.192723989486694 sec
```
```py
plt.plot(loss_plot)
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Loss Plot')
plt.show()
```
![png](img/f40a6da0d8471d4b9b979d456cb09d0d.png)
## Caption!
* The evaluate function is similar to the training loop, except you don't use teacher forcing here. The input to the decoder at each time step is its previous predictions along with the hidden state and the encoder output.
* Stop predicting when the model predicts the end token.
* And store the attention weights for every time step.
```py
def evaluate(image):
attention_plot = np.zeros((max_length, attention_features_shape))
hidden = decoder.reset_state(batch_size=1)
temp_input = tf.expand_dims(load_image(image)[0], 0)
img_tensor_val = image_features_extract_model(temp_input)
img_tensor_val = tf.reshape(img_tensor_val, (img_tensor_val.shape[0], -1, img_tensor_val.shape[3]))
features = encoder(img_tensor_val)
dec_input = tf.expand_dims([tokenizer.word_index['<start>']], 0)
result = []
for i in range(max_length):
predictions, hidden, attention_weights = decoder(dec_input, features, hidden)
attention_plot[i] = tf.reshape(attention_weights, (-1, )).numpy()
predicted_id = tf.random.categorical(predictions, 1)[0][0].numpy()
result.append(tokenizer.index_word[predicted_id])
if tokenizer.index_word[predicted_id] == '<end>':
return result, attention_plot
dec_input = tf.expand_dims([predicted_id], 0)
attention_plot = attention_plot[:len(result), :]
return result, attention_plot
```
```py
def plot_attention(image, result, attention_plot):
temp_image = np.array(Image.open(image))
fig = plt.figure(figsize=(10, 10))
len_result = len(result)
for l in range(len_result):
temp_att = np.resize(attention_plot[l], (8, 8))
ax = fig.add_subplot(len_result//2, len_result//2, l+1)
ax.set_title(result[l])
img = ax.imshow(temp_image)
ax.imshow(temp_att, cmap='gray', alpha=0.6, extent=img.get_extent())
plt.tight_layout()
plt.show()
```
```py
# captions on the validation set
rid = np.random.randint(0, len(img_name_val))
image = img_name_val[rid]
real_caption = ' '.join([tokenizer.index_word[i] for i in cap_val[rid] if i not in [0]])
result, attention_plot = evaluate(image)
print ('Real Caption:', real_caption)
print ('Prediction Caption:', ' '.join(result))
plot_attention(image, result, attention_plot)
```
```py
Real Caption: <start> a <unk> clock is on display on the surface of a building <end>
Prediction Caption: a metal wall with a brick in the middle is <unk> that has some brown wall that looks out the ground <end>
```
![png](img/9cada0d075f4e1a104766ddd3754aba4.png)
## Try it on your own images
For fun, below we've provided a method you can use to caption your own images with the model we've just trained. Keep in mind, it was trained on a relatively small amount of data, and your images may be different from the training data (so be prepared for weird results!)
```py
image_url = 'https://tensorflow.org/images/surf.jpg'
image_extension = image_url[-4:]
image_path = tf.keras.utils.get_file('image'+image_extension,
origin=image_url)
result, attention_plot = evaluate(image_path)
print ('Prediction Caption:', ' '.join(result))
plot_attention(image_path, result, attention_plot)
# opening the image
Image.open(image_path)
```
```py
Downloading data from https://tensorflow.org/images/surf.jpg
65536/64400 [==============================] - 0s 2us/step
Prediction Caption: a kid in <unk> their best to fall <end>
```
![png](img/e3e3424830f874b566c07a0e86696a13.png)
![png](img/17877a5940e1f7245c707d3ecf9783e3.png)
# Next steps
Congrats! You've just trained an image captioning model with attention. Next, take a look at this example [Neural Machine Translation with Attention](https://tensorflow.google.cn/tutorials/sequences/nmt_with_attention). It uses a similar architecture to translate between Spanish and English sentences. You can also experiment with training the code in this notebook on a different dataset.

File diff suppressed because it is too large Load Diff

View File

@@ -1,996 +0,0 @@
# Fine-tuning a BERT model
> 原文:[https://tensorflow.google.cn/official_models/fine_tuning_bert](https://tensorflow.google.cn/official_models/fine_tuning_bert)
In this example, we will work through fine-tuning a BERT model using the tensorflow-models PIP package.
The pretrained BERT model this tutorial is based on is also available on [TensorFlow Hub](https://tensorflow.org/hub), to see how to use it refer to the [Hub Appendix](#hub_bert)
## Setup
### Install the TensorFlow Model Garden pip package
* `tf-models-official` is the stable Model Garden package. Note that it may not include the latest changes in the `tensorflow_models` github repo. To include latest changes, you may install `tf-models-nightly`, which is the nightly Model Garden package created daily automatically.
* pip will install all models and dependencies automatically.
```py
pip install -q tf-models-official==2.3.0
```
```py
WARNING: You are using pip version 20.2.3; however, version 20.2.4 is available.
You should consider upgrading via the '/tmpfs/src/tf_docs_env/bin/python -m pip install --upgrade pip' command.
```
### Imports
```py
import os
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds
tfds.disable_progress_bar()
from official.modeling import tf_utils
from official import nlp
from official.nlp import bert
# Load the required submodules
import official.nlp.optimization
import official.nlp.bert.bert_models
import official.nlp.bert.configs
import official.nlp.bert.run_classifier
import official.nlp.bert.tokenization
import official.nlp.data.classifier_data_lib
import official.nlp.modeling.losses
import official.nlp.modeling.models
import official.nlp.modeling.networks
```
### Resources
This directory contains the configuration, vocabulary, and a pre-trained checkpoint used in this tutorial:
```py
gs_folder_bert = "gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-12_H-768_A-12"
tf.io.gfile.listdir(gs_folder_bert)
```
```py
['bert_config.json',
'bert_model.ckpt.data-00000-of-00001',
'bert_model.ckpt.index',
'vocab.txt']
```
You can get a pre-trained BERT encoder from [TensorFlow Hub](https://hub.tensorflow.google.cn/tensorflow/bert_en_uncased_L-12_H-768_A-12/2):
```py
hub_url_bert = "https://hub.tensorflow.google.cn/tensorflow/bert_en_uncased_L-12_H-768_A-12/2"
```
## The data
For this example we used the [GLUE MRPC dataset from TFDS](https://tensorflow.google.cn/datasets/catalog/glue#gluemrpc).
This dataset is not set up so that it can be directly fed into the BERT model, so this section also handles the necessary preprocessing.
### Get the dataset from TensorFlow Datasets
The Microsoft Research Paraphrase Corpus (Dolan & Brockett, 2005) is a corpus of sentence pairs automatically extracted from online news sources, with human annotations for whether the sentences in the pair are semantically equivalent.
* Number of labels: 2.
* Size of training dataset: 3668.
* Size of evaluation dataset: 408.
* Maximum sequence length of training and evaluation dataset: 128.
```py
glue, info = tfds.load('glue/mrpc', with_info=True,
# It's small, load the whole dataset
batch_size=-1)
```
```py
Downloading and preparing dataset glue/mrpc/1.0.0 (download: 1.43 MiB, generated: Unknown size, total: 1.43 MiB) to /home/kbuilder/tensorflow_datasets/glue/mrpc/1.0.0...
Shuffling and writing examples to /home/kbuilder/tensorflow_datasets/glue/mrpc/1.0.0.incompleteKZIBN9/glue-train.tfrecord
Shuffling and writing examples to /home/kbuilder/tensorflow_datasets/glue/mrpc/1.0.0.incompleteKZIBN9/glue-validation.tfrecord
Shuffling and writing examples to /home/kbuilder/tensorflow_datasets/glue/mrpc/1.0.0.incompleteKZIBN9/glue-test.tfrecord
Dataset glue downloaded and prepared to /home/kbuilder/tensorflow_datasets/glue/mrpc/1.0.0\. Subsequent calls will reuse this data.
```
```py
list(glue.keys())
```
```py
['test', 'train', 'validation']
```
The `info` object describes the dataset and it's features:
```py
info.features
```
```py
FeaturesDict({
'idx': tf.int32,
'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=2),
'sentence1': Text(shape=(), dtype=tf.string),
'sentence2': Text(shape=(), dtype=tf.string),
})
```
The two classes are:
```py
info.features['label'].names
```
```py
['not_equivalent', 'equivalent']
```
Here is one example from the training set:
```py
glue_train = glue['train']
for key, value in glue_train.items():
print(f"{key:9s}: {value[0].numpy()}")
```
```py
idx : 1680
label : 0
sentence1: b'The identical rovers will act as robotic geologists , searching for evidence of past water .'
sentence2: b'The rovers act as robotic geologists , moving on six wheels .'
```
### The BERT tokenizer
To fine tune a pre-trained model you need to be sure that you're using exactly the same tokenization, vocabulary, and index mapping as you used during training.
The BERT tokenizer used in this tutorial is written in pure Python (It's not built out of TensorFlow ops). So you can't just plug it into your model as a `keras.layer` like you can with [`preprocessing.TextVectorization`](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/experimental/preprocessing/TextVectorization).
The following code rebuilds the tokenizer that was used by the base model:
```py
# Set up tokenizer to generate Tensorflow dataset
tokenizer = bert.tokenization.FullTokenizer(
vocab_file=os.path.join(gs_folder_bert, "vocab.txt"),
do_lower_case=True)
print("Vocab size:", len(tokenizer.vocab))
```
```py
Vocab size: 30522
```
Tokenize a sentence:
```py
tokens = tokenizer.tokenize("Hello TensorFlow!")
print(tokens)
ids = tokenizer.convert_tokens_to_ids(tokens)
print(ids)
```
```py
['hello', 'tensor', '##flow', '!']
[7592, 23435, 12314, 999]
```
### Preprocess the data
The section manually preprocessed the dataset into the format expected by the model.
This dataset is small, so preprocessing can be done quickly and easily in memory. For larger datasets the `tf_models` library includes some tools for preprocessing and re-serializing a dataset. See [Appendix: Re-encoding a large dataset](#re_encoding_tools) for details.
#### Encode the sentences
The model expects its two inputs sentences to be concatenated together. This input is expected to start with a `[CLS]` "This is a classification problem" token, and each sentence should end with a `[SEP]` "Separator" token:
```py
tokenizer.convert_tokens_to_ids(['[CLS]', '[SEP]'])
```
```py
[101, 102]
```
Start by encoding all the sentences while appending a `[SEP]` token, and packing them into ragged-tensors:
```py
def encode_sentence(s):
tokens = list(tokenizer.tokenize(s.numpy()))
tokens.append('[SEP]')
return tokenizer.convert_tokens_to_ids(tokens)
sentence1 = tf.ragged.constant([
encode_sentence(s) for s in glue_train["sentence1"]])
sentence2 = tf.ragged.constant([
encode_sentence(s) for s in glue_train["sentence2"]])
```
```py
print("Sentence1 shape:", sentence1.shape.as_list())
print("Sentence2 shape:", sentence2.shape.as_list())
```
```py
Sentence1 shape: [3668, None]
Sentence2 shape: [3668, None]
```
Now prepend a `[CLS]` token, and concatenate the ragged tensors to form a single `input_word_ids` tensor for each example. [`RaggedTensor.to_tensor()`](https://tensorflow.google.cn/api_docs/python/tf/RaggedTensor#to_tensor) zero pads to the longest sequence.
```py
cls = [tokenizer.convert_tokens_to_ids(['[CLS]'])]*sentence1.shape[0]
input_word_ids = tf.concat([cls, sentence1, sentence2], axis=-1)
_ = plt.pcolormesh(input_word_ids.to_tensor())
```
![png](img/10d71bce93ec45ba7076ef15a37bcb28.png)
#### Mask and input type
The model expects two additional inputs:
* The input mask
* The input type
The mask allows the model to cleanly differentiate between the content and the padding. The mask has the same shape as the `input_word_ids`, and contains a `1` anywhere the `input_word_ids` is not padding.
```py
input_mask = tf.ones_like(input_word_ids).to_tensor()
plt.pcolormesh(input_mask)
```
```py
<matplotlib.collections.QuadMesh at 0x7fad1c07ed30>
```
![png](img/1f9a0765029471b20952ac80887f73a4.png)
The "input type" also has the same shape, but inside the non-padded region, contains a `0` or a `1` indicating which sentence the token is a part of.
```py
type_cls = tf.zeros_like(cls)
type_s1 = tf.zeros_like(sentence1)
type_s2 = tf.ones_like(sentence2)
input_type_ids = tf.concat([type_cls, type_s1, type_s2], axis=-1).to_tensor()
plt.pcolormesh(input_type_ids)
```
```py
<matplotlib.collections.QuadMesh at 0x7fad143c1710>
```
![png](img/e06760b4112e8fd989cdb1f7a948bc17.png)
#### Put it all together
Collect the above text parsing code into a single function, and apply it to each split of the `glue/mrpc` dataset.
```py
def encode_sentence(s, tokenizer):
tokens = list(tokenizer.tokenize(s))
tokens.append('[SEP]')
return tokenizer.convert_tokens_to_ids(tokens)
def bert_encode(glue_dict, tokenizer):
num_examples = len(glue_dict["sentence1"])
sentence1 = tf.ragged.constant([
encode_sentence(s, tokenizer)
for s in np.array(glue_dict["sentence1"])])
sentence2 = tf.ragged.constant([
encode_sentence(s, tokenizer)
for s in np.array(glue_dict["sentence2"])])
cls = [tokenizer.convert_tokens_to_ids(['[CLS]'])]*sentence1.shape[0]
input_word_ids = tf.concat([cls, sentence1, sentence2], axis=-1)
input_mask = tf.ones_like(input_word_ids).to_tensor()
type_cls = tf.zeros_like(cls)
type_s1 = tf.zeros_like(sentence1)
type_s2 = tf.ones_like(sentence2)
input_type_ids = tf.concat(
[type_cls, type_s1, type_s2], axis=-1).to_tensor()
inputs = {
'input_word_ids': input_word_ids.to_tensor(),
'input_mask': input_mask,
'input_type_ids': input_type_ids}
return inputs
```
```py
glue_train = bert_encode(glue['train'], tokenizer)
glue_train_labels = glue['train']['label']
glue_validation = bert_encode(glue['validation'], tokenizer)
glue_validation_labels = glue['validation']['label']
glue_test = bert_encode(glue['test'], tokenizer)
glue_test_labels = glue['test']['label']
```
Each subset of the data has been converted to a dictionary of features, and a set of labels. Each feature in the input dictionary has the same shape, and the number of labels should match:
```py
for key, value in glue_train.items():
print(f'{key:15s} shape: {value.shape}')
print(f'glue_train_labels shape: {glue_train_labels.shape}')
```
```py
input_word_ids shape: (3668, 103)
input_mask shape: (3668, 103)
input_type_ids shape: (3668, 103)
glue_train_labels shape: (3668,)
```
## The model
### Build the model
The first step is to download the configuration for the pre-trained model.
```py
import json
bert_config_file = os.path.join(gs_folder_bert, "bert_config.json")
config_dict = json.loads(tf.io.gfile.GFile(bert_config_file).read())
bert_config = bert.configs.BertConfig.from_dict(config_dict)
config_dict
```
```py
{'attention_probs_dropout_prob': 0.1,
'hidden_act': 'gelu',
'hidden_dropout_prob': 0.1,
'hidden_size': 768,
'initializer_range': 0.02,
'intermediate_size': 3072,
'max_position_embeddings': 512,
'num_attention_heads': 12,
'num_hidden_layers': 12,
'type_vocab_size': 2,
'vocab_size': 30522}
```
The `config` defines the core BERT Model, which is a Keras model to predict the outputs of `num_classes` from the inputs with maximum sequence length `max_seq_length`.
This function returns both the encoder and the classifier.
```py
bert_classifier, bert_encoder = bert.bert_models.classifier_model(
bert_config, num_labels=2)
```
The classifier has three inputs and one output:
```py
tf.keras.utils.plot_model(bert_classifier, show_shapes=True, dpi=48)
```
![png](img/906a04e5434908ec33033e39f2e83f6b.png)
Run it on a test batch of data 10 examples from the training set. The output is the logits for the two classes:
```py
glue_batch = {key: val[:10] for key, val in glue_train.items()}
bert_classifier(
glue_batch, training=True
).numpy()
```
```py
array([[ 0.08382261, 0.34465584],
[ 0.02057236, 0.24053624],
[ 0.04930754, 0.1117427 ],
[ 0.17041089, 0.20810834],
[ 0.21667874, 0.2840511 ],
[ 0.02325345, 0.33799925],
[-0.06198866, 0.13532838],
[ 0.084592 , 0.20711854],
[-0.04323687, 0.17096342],
[ 0.23759182, 0.16801538]], dtype=float32)
```
The `TransformerEncoder` in the center of the classifier above **is** the `bert_encoder`.
Inspecting the encoder, we see its stack of `Transformer` layers connected to those same three inputs:
```py
tf.keras.utils.plot_model(bert_encoder, show_shapes=True, dpi=48)
```
![png](img/6d5e829de3a867f7bb56dff003b7e217.png)
### Restore the encoder weights
When built the encoder is randomly initialized. Restore the encoder's weights from the checkpoint:
```py
checkpoint = tf.train.Checkpoint(model=bert_encoder)
checkpoint.restore(
os.path.join(gs_folder_bert, 'bert_model.ckpt')).assert_consumed()
```
```py
<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7fad4580ffd0>
```
**Note:** The pretrained `TransformerEncoder` is also available on [TensorFlow Hub](https://tensorflow.org/hub). See the [Hub appendix](#hub_bert) for details.
### Set up the optimizer
BERT adopts the Adam optimizer with weight decay (aka "[AdamW](https://arxiv.org/abs/1711.05101)"). It also employs a learning rate schedule that firstly warms up from 0 and then decays to 0.
```py
# Set up epochs and steps
epochs = 3
batch_size = 32
eval_batch_size = 32
train_data_size = len(glue_train_labels)
steps_per_epoch = int(train_data_size / batch_size)
num_train_steps = steps_per_epoch * epochs
warmup_steps = int(epochs * train_data_size * 0.1 / batch_size)
# creates an optimizer with learning rate schedule
optimizer = nlp.optimization.create_optimizer(
2e-5, num_train_steps=num_train_steps, num_warmup_steps=warmup_steps)
```
This returns an `AdamWeightDecay` optimizer with the learning rate schedule set:
```py
type(optimizer)
```
```py
official.nlp.optimization.AdamWeightDecay
```
To see an example of how to customize the optimizer and it's schedule, see the [Optimizer schedule appendix](#optiizer_schedule).
### Train the model
The metric is accuracy and we use sparse categorical cross-entropy as loss.
```py
metrics = [tf.keras.metrics.SparseCategoricalAccuracy('accuracy', dtype=tf.float32)]
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
bert_classifier.compile(
optimizer=optimizer,
loss=loss,
metrics=metrics)
bert_classifier.fit(
glue_train, glue_train_labels,
validation_data=(glue_validation, glue_validation_labels),
batch_size=32,
epochs=epochs)
```
```py
Epoch 1/3
115/115 [==============================] - 26s 222ms/step - loss: 0.6151 - accuracy: 0.6611 - val_loss: 0.5462 - val_accuracy: 0.7451
Epoch 2/3
115/115 [==============================] - 24s 212ms/step - loss: 0.4447 - accuracy: 0.8010 - val_loss: 0.4150 - val_accuracy: 0.8309
Epoch 3/3
115/115 [==============================] - 24s 213ms/step - loss: 0.2830 - accuracy: 0.8964 - val_loss: 0.3697 - val_accuracy: 0.8480
<tensorflow.python.keras.callbacks.History at 0x7fad000ebda0>
```
Now run the fine-tuned model on a custom example to see that it works.
Start by encoding some sentence pairs:
```py
my_examples = bert_encode(
glue_dict = {
'sentence1':[
'The rain in Spain falls mainly on the plain.',
'Look I fine tuned BERT.'],
'sentence2':[
'It mostly rains on the flat lands of Spain.',
'Is it working? This does not match.']
},
tokenizer=tokenizer)
```
The model should report class `1` "match" for the first example and class `0` "no-match" for the second:
```py
result = bert_classifier(my_examples, training=False)
result = tf.argmax(result).numpy()
result
```
```py
array([1, 0])
```
```py
np.array(info.features['label'].names)[result]
```
```py
array(['equivalent', 'not_equivalent'], dtype='<U14')
```
### Save the model
Often the goal of training a model is to *use* it for something, so export the model and then restore it to be sure that it works.
```py
export_dir='./saved_model'
tf.saved_model.save(bert_classifier, export_dir=export_dir)
```
```py
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/tracking/tracking.py:111: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/tracking/tracking.py:111: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/tracking/tracking.py:111: Layer.updates (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
Warning:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.6/site-packages/tensorflow/python/training/tracking/tracking.py:111: Layer.updates (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
INFO:tensorflow:Assets written to: ./saved_model/assets
INFO:tensorflow:Assets written to: ./saved_model/assets
```
```py
reloaded = tf.saved_model.load(export_dir)
reloaded_result = reloaded([my_examples['input_word_ids'],
my_examples['input_mask'],
my_examples['input_type_ids']], training=False)
original_result = bert_classifier(my_examples, training=False)
# The results are (nearly) identical:
print(original_result.numpy())
print()
print(reloaded_result.numpy())
```
```py
[[-0.95450354 1.1227685 ]
[ 0.40344787 -0.58954155]]
[[-0.95450354 1.1227684 ]
[ 0.4034478 -0.5895414 ]]
```
## Appendix
### Re-encoding a large dataset
This tutorial you re-encoded the dataset in memory, for clarity.
This was only possible because `glue/mrpc` is a very small dataset. To deal with larger datasets `tf_models` library includes some tools for processing and re-encoding a dataset for efficient training.
The first step is to describe which features of the dataset should be transformed:
```py
processor = nlp.data.classifier_data_lib.TfdsProcessor(
tfds_params="dataset=glue/mrpc,text_key=sentence1,text_b_key=sentence2",
process_text_fn=bert.tokenization.convert_to_unicode)
```
Then apply the transformation to generate new TFRecord files.
```py
# Set up output of training and evaluation Tensorflow dataset
train_data_output_path="./mrpc_train.tf_record"
eval_data_output_path="./mrpc_eval.tf_record"
max_seq_length = 128
batch_size = 32
eval_batch_size = 32
# Generate and save training data into a tf record file
input_meta_data = (
nlp.data.classifier_data_lib.generate_tf_record_from_data_file(
processor=processor,
data_dir=None, # It is `None` because data is from tfds, not local dir.
tokenizer=tokenizer,
train_data_output_path=train_data_output_path,
eval_data_output_path=eval_data_output_path,
max_seq_length=max_seq_length))
```
Finally create [`tf.data`](https://tensorflow.google.cn/api_docs/python/tf/data) input pipelines from those TFRecord files:
```py
training_dataset = bert.run_classifier.get_dataset_fn(
train_data_output_path,
max_seq_length,
batch_size,
is_training=True)()
evaluation_dataset = bert.run_classifier.get_dataset_fn(
eval_data_output_path,
max_seq_length,
eval_batch_size,
is_training=False)()
```
The resulting `tf.data.Datasets` return `(features, labels)` pairs, as expected by [`keras.Model.fit`](https://tensorflow.google.cn/api_docs/python/tf/keras/Model#fit):
```py
training_dataset.element_spec
```
```py
({'input_word_ids': TensorSpec(shape=(32, 128), dtype=tf.int32, name=None),
'input_mask': TensorSpec(shape=(32, 128), dtype=tf.int32, name=None),
'input_type_ids': TensorSpec(shape=(32, 128), dtype=tf.int32, name=None)},
TensorSpec(shape=(32,), dtype=tf.int32, name=None))
```
#### Create tf.data.Dataset for training and evaluation
If you need to modify the data loading here is some code to get you started:
```py
def create_classifier_dataset(file_path, seq_length, batch_size, is_training):
"""Creates input dataset from (tf)records files for train/eval."""
dataset = tf.data.TFRecordDataset(file_path)
if is_training:
dataset = dataset.shuffle(100)
dataset = dataset.repeat()
def decode_record(record):
name_to_features = {
'input_ids': tf.io.FixedLenFeature([seq_length], tf.int64),
'input_mask': tf.io.FixedLenFeature([seq_length], tf.int64),
'segment_ids': tf.io.FixedLenFeature([seq_length], tf.int64),
'label_ids': tf.io.FixedLenFeature([], tf.int64),
}
return tf.io.parse_single_example(record, name_to_features)
def _select_data_from_record(record):
x = {
'input_word_ids': record['input_ids'],
'input_mask': record['input_mask'],
'input_type_ids': record['segment_ids']
}
y = record['label_ids']
return (x, y)
dataset = dataset.map(decode_record,
num_parallel_calls=tf.data.experimental.AUTOTUNE)
dataset = dataset.map(
_select_data_from_record,
num_parallel_calls=tf.data.experimental.AUTOTUNE)
dataset = dataset.batch(batch_size, drop_remainder=is_training)
dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
return dataset
```
```py
# Set up batch sizes
batch_size = 32
eval_batch_size = 32
# Return Tensorflow dataset
training_dataset = create_classifier_dataset(
train_data_output_path,
input_meta_data['max_seq_length'],
batch_size,
is_training=True)
evaluation_dataset = create_classifier_dataset(
eval_data_output_path,
input_meta_data['max_seq_length'],
eval_batch_size,
is_training=False)
```
```py
training_dataset.element_spec
```
```py
({'input_word_ids': TensorSpec(shape=(32, 128), dtype=tf.int64, name=None),
'input_mask': TensorSpec(shape=(32, 128), dtype=tf.int64, name=None),
'input_type_ids': TensorSpec(shape=(32, 128), dtype=tf.int64, name=None)},
TensorSpec(shape=(32,), dtype=tf.int64, name=None))
```
### TFModels BERT on TFHub
You can get [the BERT model](https://hub.tensorflow.google.cn/tensorflow/bert_en_uncased_L-12_H-768_A-12/2) off the shelf from [TFHub](https://tensorflow.org/hub). It would not be hard to add a classification head on top of this [`hub.KerasLayer`](https://tensorflow.google.cn/hub/api_docs/python/hub/KerasLayer)
```py
# Note: 350MB download.
import tensorflow_hub as hub
```
```py
hub_model_name = "bert_en_uncased_L-12_H-768_A-12"
```
```py
hub_encoder = hub.KerasLayer(f"https://hub.tensorflow.google.cn/tensorflow/{hub_model_name}/2",
trainable=True)
print(f"The Hub encoder has {len(hub_encoder.trainable_variables)} trainable variables")
```
```py
The Hub encoder has 199 trainable variables
```
Test run it on a batch of data:
```py
result = hub_encoder(
inputs=[glue_train['input_word_ids'][:10],
glue_train['input_mask'][:10],
glue_train['input_type_ids'][:10],],
training=False,
)
print("Pooled output shape:", result[0].shape)
print("Sequence output shape:", result[1].shape)
```
```py
Pooled output shape: (10, 768)
Sequence output shape: (10, 103, 768)
```
At this point it would be simple to add a classification head yourself.
The `bert_models.classifier_model` function can also build a classifier onto the encoder from TensorFlow Hub:
```py
hub_classifier, hub_encoder = bert.bert_models.classifier_model(
# Caution: Most of `bert_config` is ignored if you pass a hub url.
bert_config=bert_config, hub_module_url=hub_url_bert, num_labels=2)
```
The one downside to loading this model from TFHub is that the structure of internal keras layers is not restored. So it's more difficult to inspect or modify the model. The `TransformerEncoder` model is now a single layer:
```py
tf.keras.utils.plot_model(hub_classifier, show_shapes=True, dpi=64)
```
![png](img/563b223dd04889d1963c53d7c10dfa02.png)
```py
try:
tf.keras.utils.plot_model(hub_encoder, show_shapes=True, dpi=64)
assert False
except Exception as e:
print(f"{type(e).__name__}: {e}")
```
```py
AttributeError: 'KerasLayer' object has no attribute 'layers'
```
### Low level model building
If you need a more control over the construction of the model it's worth noting that the `classifier_model` function used earlier is really just a thin wrapper over the `nlp.modeling.networks.TransformerEncoder` and `nlp.modeling.models.BertClassifier` classes. Just remember that if you start modifying the architecture it may not be correct or possible to reload the pre-trained checkpoint so you'll need to retrain from scratch.
Build the encoder:
```py
transformer_config = config_dict.copy()
# You need to rename a few fields to make this work:
transformer_config['attention_dropout_rate'] = transformer_config.pop('attention_probs_dropout_prob')
transformer_config['activation'] = tf_utils.get_activation(transformer_config.pop('hidden_act'))
transformer_config['dropout_rate'] = transformer_config.pop('hidden_dropout_prob')
transformer_config['initializer'] = tf.keras.initializers.TruncatedNormal(
stddev=transformer_config.pop('initializer_range'))
transformer_config['max_sequence_length'] = transformer_config.pop('max_position_embeddings')
transformer_config['num_layers'] = transformer_config.pop('num_hidden_layers')
transformer_config
```
```py
{'hidden_size': 768,
'intermediate_size': 3072,
'num_attention_heads': 12,
'type_vocab_size': 2,
'vocab_size': 30522,
'attention_dropout_rate': 0.1,
'activation': <function official.modeling.activations.gelu.gelu(x)>,
'dropout_rate': 0.1,
'initializer': <tensorflow.python.keras.initializers.initializers_v2.TruncatedNormal at 0x7fac08046e10>,
'max_sequence_length': 512,
'num_layers': 12}
```
```py
manual_encoder = nlp.modeling.networks.TransformerEncoder(**transformer_config)
```
Restore the weights:
```py
checkpoint = tf.train.Checkpoint(model=manual_encoder)
checkpoint.restore(
os.path.join(gs_folder_bert, 'bert_model.ckpt')).assert_consumed()
```
```py
<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7fabefa596d8>
```
Test run it:
```py
result = manual_encoder(my_examples, training=True)
print("Sequence output shape:", result[0].shape)
print("Pooled output shape:", result[1].shape)
```
```py
Sequence output shape: (2, 23, 768)
Pooled output shape: (2, 768)
```
Wrap it in a classifier:
```py
manual_classifier = nlp.modeling.models.BertClassifier(
bert_encoder,
num_classes=2,
dropout_rate=transformer_config['dropout_rate'],
initializer=tf.keras.initializers.TruncatedNormal(
stddev=bert_config.initializer_range))
```
```py
manual_classifier(my_examples, training=True).numpy()
```
```py
array([[ 0.07863025, -0.02940944],
[ 0.30274656, 0.27299827]], dtype=float32)
```
### Optimizers and schedules
The optimizer used to train the model was created using the `nlp.optimization.create_optimizer` function:
```py
optimizer = nlp.optimization.create_optimizer(
2e-5, num_train_steps=num_train_steps, num_warmup_steps=warmup_steps)
```
That high level wrapper sets up the learning rate schedules and the optimizer.
The base learning rate schedule used here is a linear decay to zero over the training run:
```py
epochs = 3
batch_size = 32
eval_batch_size = 32
train_data_size = len(glue_train_labels)
steps_per_epoch = int(train_data_size / batch_size)
num_train_steps = steps_per_epoch * epochs
```
```py
decay_schedule = tf.keras.optimizers.schedules.PolynomialDecay(
initial_learning_rate=2e-5,
decay_steps=num_train_steps,
end_learning_rate=0)
plt.plot([decay_schedule(n) for n in range(num_train_steps)])
```
```py
[<matplotlib.lines.Line2D at 0x7fabef5e69e8>]
```
![png](img/868f946086995ef931b7b454d904e14b.png)
This, in turn is wrapped in a `WarmUp` schedule that linearly increases the learning rate to the target value over the first 10% of training:
```py
warmup_steps = num_train_steps * 0.1
warmup_schedule = nlp.optimization.WarmUp(
initial_learning_rate=2e-5,
decay_schedule_fn=decay_schedule,
warmup_steps=warmup_steps)
# The warmup overshoots, because it warms up to the `initial_learning_rate`
# following the original implementation. You can set
# `initial_learning_rate=decay_schedule(warmup_steps)` if you don't like the
# overshoot.
plt.plot([warmup_schedule(n) for n in range(num_train_steps)])
```
```py
[<matplotlib.lines.Line2D at 0x7fabef559630>]
```
![png](img/c542bc6784512a8abdc2e3a85a1e1905.png)
Then create the `nlp.optimization.AdamWeightDecay` using that schedule, configured for the BERT model:
```py
optimizer = nlp.optimization.AdamWeightDecay(
learning_rate=warmup_schedule,
weight_decay_rate=0.01,
epsilon=1e-6,
exclude_from_weight_decay=['LayerNorm', 'layer_norm', 'bias'])
```

View File

@@ -1 +0,0 @@
# 结构化数据

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1 +0,0 @@
# 生成

View File

@@ -1,689 +0,0 @@
# 神经风格迁移
> 原文:[https://tensorflow.google.cn/tutorials/generative/style_transfer](https://tensorflow.google.cn/tutorials/generative/style_transfer)
<devsite-mathjax config="TeX-AMS-MML_SVG"></devsite-mathjax>
**Note:** 我们的 TensorFlow 社区翻译了这些文档。因为社区翻译是尽力而为, 所以无法保证它们是最准确的,并且反映了最新的 [官方英文文档](https://tensorflow.google.cn/?hl=en)。如果您有改进此翻译的建议, 请提交 pull request 到 [tensorflow/docs](https://github.com/tensorflow/docs) GitHub 仓库。要志愿地撰写或者审核译文,请加入 [docs-zh-cn@tensorflow.org Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs-zh-cn)。
本教程使用深度学习来用其他图像的风格创造一个图像(曾经你是否希望可以像毕加索或梵高一样绘画?)。 这被称为*神经风格迁移*,该技术概述于 [A Neural Algorithm of Artistic Style](https://arxiv.org/abs/1508.06576) (Gatys et al.).
**Note:** 本教程演示了原始的风格迁移算法。它将图像内容优化为特定样式。最新的一些方法训练模型以直接生成风格化图像(类似于 [cyclegan](/tutorials/generative/cyclegan))。原始的这种方法要快得多(高达 1000 倍)。[TensorFlow Hub](https://tensorflow.google.cn/hub) 和 [TensorFlow Lite](https://tensorflow.google.cn/lite/models/style_transfer/overview) 中提供了预训练的[任意图像风格化模块](https://colab.sandbox.google.com/github/tensorflow/hub/blob/master/examples/colab/tf2_arbitrary_image_stylization.ipynb)。
神经风格迁移是一种优化技术,用于将两个图像——一个*内容*图像和一个*风格参考*图像(如著名画家的一个作品)——混合在一起,使输出的图像看起来像内容图像, 但是用了风格参考图像的风格。
这是通过优化输出图像以匹配内容图像的内容统计数据和风格参考图像的风格统计数据来实现的。 这些统计数据可以使用卷积网络从图像中提取。
例如,我们选取这张小狗的照片和 Wassily Kandinsky 的作品 7
![](img/8d456c03cff000c86147a07dbbcb6f32.png)
[黄色拉布拉多犬的凝视](https://commons.wikimedia.org/wiki/File:YellowLabradorLooking_new.jpg),来自 Wikimedia Commons
![](img/35253af9a3f5a4e0035787fd80b11ca3.png)
如果 Kandinsky 决定用这种风格来专门描绘这只海龟会是什么样子? 是否如下图一样?
![](img/40793e753f5cc525c8f3c9cd20d1085c.png)
## 配置
### 导入和配置模块
```py
import tensorflow as tf
```
```py
import IPython.display as display
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = (12,12)
mpl.rcParams['axes.grid'] = False
import numpy as np
import PIL.Image
import time
import functools
```
```py
def tensor_to_image(tensor):
tensor = tensor*255
tensor = np.array(tensor, dtype=np.uint8)
if np.ndim(tensor)>3:
assert tensor.shape[0] == 1
tensor = tensor[0]
return PIL.Image.fromarray(tensor)
```
下载图像并选择风格图像和内容图像:
```py
content_path = tf.keras.utils.get_file('YellowLabradorLooking_new.jpg', 'https://storage.googleapis.com/download.tensorflow.org/example_images/YellowLabradorLooking_new.jpg')
# https://commons.wikimedia.org/wiki/File:Vassily_Kandinsky,_1913_-_Composition_7.jpg
style_path = tf.keras.utils.get_file('kandinsky5.jpg','https://storage.googleapis.com/download.tensorflow.org/example_images/Vassily_Kandinsky%2C_1913_-_Composition_7.jpg')
```
```py
Downloading data from https://storage.googleapis.com/download.tensorflow.org/example_images/YellowLabradorLooking_new.jpg
90112/83281 [================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/download.tensorflow.org/example_images/Vassily_Kandinsky%2C_1913_-_Composition_7.jpg
196608/195196 [==============================] - 0s 0us/step
```
## 将输入可视化
定义一个加载图像的函数,并将其最大尺寸限制为 512 像素。
```py
def load_img(path_to_img):
max_dim = 512
img = tf.io.read_file(path_to_img)
img = tf.image.decode_image(img, channels=3)
img = tf.image.convert_image_dtype(img, tf.float32)
shape = tf.cast(tf.shape(img)[:-1], tf.float32)
long_dim = max(shape)
scale = max_dim / long_dim
new_shape = tf.cast(shape * scale, tf.int32)
img = tf.image.resize(img, new_shape)
img = img[tf.newaxis, :]
return img
```
创建一个简单的函数来显示图像:
```py
def imshow(image, title=None):
if len(image.shape) > 3:
image = tf.squeeze(image, axis=0)
plt.imshow(image)
if title:
plt.title(title)
```
```py
content_image = load_img(content_path)
style_image = load_img(style_path)
plt.subplot(1, 2, 1)
imshow(content_image, 'Content Image')
plt.subplot(1, 2, 2)
imshow(style_image, 'Style Image')
```
![png](img/d85fdaff014f0211e5ef646977087e50.png)
## 使用 TF-Hub 进行快速风格迁移
本教程演示了原始的风格迁移算法。其将图像内容优化为特定风格。在进入细节之前,让我们看一下 [TensorFlow Hub](https://tensorflow.google.cn/hub) 模块如何快速风格迁移:
```py
import tensorflow_hub as hub
hub_module = hub.load('https://hub.tensorflow.google.cn/google/magenta/arbitrary-image-stylization-v1-256/1')
stylized_image = hub_module(tf.constant(content_image), tf.constant(style_image))[0]
tensor_to_image(stylized_image)
```
![png](img/833d9eeff633ce77dec2eb85f74e8bbb.png)
## 定义内容和风格的表示
使用模型的中间层来获取图像的*内容*和*风格*表示。 从网络的输入层开始,前几个层的激励响应表示边缘和纹理等低级 feature (特征)。 随着层数加深,最后几层代表更高级的 feature (特征)——实体的部分,如*轮子*或*眼睛*。 在此教程中,我们使用的是 VGG19 网络结构,这是一个已经预训练好的图像分类网络。 这些中间层是从图像中定义内容和风格的表示所必需的。 对于一个输入图像,我们尝试匹配这些中间层的相应风格和内容目标的表示。
加载 [VGG19](https://keras.io/applications/#vgg19) 并在我们的图像上测试它以确保正常运行:
```py
x = tf.keras.applications.vgg19.preprocess_input(content_image*255)
x = tf.image.resize(x, (224, 224))
vgg = tf.keras.applications.VGG19(include_top=True, weights='imagenet')
prediction_probabilities = vgg(x)
prediction_probabilities.shape
```
```py
TensorShape([1, 1000])
```
```py
predicted_top_5 = tf.keras.applications.vgg19.decode_predictions(prediction_probabilities.numpy())[0]
[(class_name, prob) for (number, class_name, prob) in predicted_top_5]
```
```py
Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/imagenet_class_index.json
40960/35363 [==================================] - 0s 0us/step
[('Labrador_retriever', 0.493171),
('golden_retriever', 0.23665288),
('kuvasz', 0.036357544),
('Chesapeake_Bay_retriever', 0.024182763),
('Greater_Swiss_Mountain_dog', 0.0186461)]
```
现在,加载没有分类部分的 `VGG19` ,并列出各层的名称:
```py
vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet')
print()
for layer in vgg.layers:
print(layer.name)
```
```py
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg19/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5
80142336/80134624 [==============================] - 1s 0us/step
input_2
block1_conv1
block1_conv2
block1_pool
block2_conv1
block2_conv2
block2_pool
block3_conv1
block3_conv2
block3_conv3
block3_conv4
block3_pool
block4_conv1
block4_conv2
block4_conv3
block4_conv4
block4_pool
block5_conv1
block5_conv2
block5_conv3
block5_conv4
block5_pool
```
从网络中选择中间层的输出以表示图像的风格和内容:
```py
# 内容层将提取出我们的 feature maps (特征图)
content_layers = ['block5_conv2']
# 我们感兴趣的风格层
style_layers = ['block1_conv1',
'block2_conv1',
'block3_conv1',
'block4_conv1',
'block5_conv1']
num_content_layers = len(content_layers)
num_style_layers = len(style_layers)
```
#### 用于表示风格和内容的中间层
那么,为什么我们预训练的图像分类网络中的这些中间层的输出允许我们定义风格和内容的表示?
从高层理解,为了使网络能够实现图像分类(该网络已被训练过),它必须理解图像。 这需要将原始图像作为输入像素并构建内部表示,这个内部表示将原始图像像素转换为对图像中存在的 feature (特征)的复杂理解。
这也是卷积神经网络能够很好地推广的一个原因:它们能够捕获不变性并定义类别(例如猫与狗)之间的 feature (特征),这些 feature (特征)与背景噪声和其他干扰无关。 因此,将原始图像传递到模型输入和分类标签输出之间的某处的这一过程,可以视作复杂的 feature (特征)提取器。通过这些模型的中间层,我们就可以描述输入图像的内容和风格。
## 建立模型
使用[`tf.keras.applications`](https://tensorflow.google.cn/api_docs/python/tf/keras/applications)中的网络可以让我们非常方便的利用 Keras 的功能接口提取中间层的值。
在使用功能接口定义模型时,我们需要指定输入和输出:
`model = Model(inputs, outputs)`
以下函数构建了一个 VGG19 模型,该模型返回一个中间层输出的列表:
```py
def vgg_layers(layer_names):
""" Creates a vgg model that returns a list of intermediate output values."""
# 加载我们的模型。 加载已经在 imagenet 数据上预训练的 VGG
vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet')
vgg.trainable = False
outputs = [vgg.get_layer(name).output for name in layer_names]
model = tf.keras.Model([vgg.input], outputs)
return model
```
然后建立模型:
```py
style_extractor = vgg_layers(style_layers)
style_outputs = style_extractor(style_image*255)
#查看每层输出的统计信息
for name, output in zip(style_layers, style_outputs):
print(name)
print(" shape: ", output.numpy().shape)
print(" min: ", output.numpy().min())
print(" max: ", output.numpy().max())
print(" mean: ", output.numpy().mean())
print()
```
```py
block1_conv1
shape: (1, 336, 512, 64)
min: 0.0
max: 835.5256
mean: 33.97525
block2_conv1
shape: (1, 168, 256, 128)
min: 0.0
max: 4625.8857
mean: 199.82687
block3_conv1
shape: (1, 84, 128, 256)
min: 0.0
max: 8789.239
mean: 230.78099
block4_conv1
shape: (1, 42, 64, 512)
min: 0.0
max: 21566.135
mean: 791.24005
block5_conv1
shape: (1, 21, 32, 512)
min: 0.0
max: 3189.2542
mean: 59.179478
```
## 风格计算
图像的内容由中间 feature maps (特征图)的值表示。
事实证明,图像的风格可以通过不同 feature maps (特征图)上的平均值和相关性来描述。 通过在每个位置计算 feature (特征)向量的外积,并在所有位置对该外积进行平均,可以计算出包含此信息的 Gram 矩阵。 对于特定层的 Gram 矩阵,具体计算方法如下所示:
$$G^l_{cd} = \frac{\sum_{ij} F^l_{ijc}(x)F^l_{ijd}(x)}{IJ}$$
这可以使用[`tf.linalg.einsum`](https://tensorflow.google.cn/api_docs/python/tf/einsum)函数来实现:
```py
def gram_matrix(input_tensor):
result = tf.linalg.einsum('bijc,bijd->bcd', input_tensor, input_tensor)
input_shape = tf.shape(input_tensor)
num_locations = tf.cast(input_shape[1]*input_shape[2], tf.float32)
return result/(num_locations)
```
## 提取风格和内容
构建一个返回风格和内容张量的模型。
```py
class StyleContentModel(tf.keras.models.Model):
def __init__(self, style_layers, content_layers):
super(StyleContentModel, self).__init__()
self.vgg = vgg_layers(style_layers + content_layers)
self.style_layers = style_layers
self.content_layers = content_layers
self.num_style_layers = len(style_layers)
self.vgg.trainable = False
def call(self, inputs):
"Expects float input in [0,1]"
inputs = inputs*255.0
preprocessed_input = tf.keras.applications.vgg19.preprocess_input(inputs)
outputs = self.vgg(preprocessed_input)
style_outputs, content_outputs = (outputs[:self.num_style_layers],
outputs[self.num_style_layers:])
style_outputs = [gram_matrix(style_output)
for style_output in style_outputs]
content_dict = {content_name:value
for content_name, value
in zip(self.content_layers, content_outputs)}
style_dict = {style_name:value
for style_name, value
in zip(self.style_layers, style_outputs)}
return {'content':content_dict, 'style':style_dict}
```
在图像上调用此模型,可以返回 style_layers 的 gram 矩阵(风格)和 content_layers 的内容:
```py
extractor = StyleContentModel(style_layers, content_layers)
results = extractor(tf.constant(content_image))
style_results = results['style']
print('Styles:')
for name, output in sorted(results['style'].items()):
print(" ", name)
print(" shape: ", output.numpy().shape)
print(" min: ", output.numpy().min())
print(" max: ", output.numpy().max())
print(" mean: ", output.numpy().mean())
print()
print("Contents:")
for name, output in sorted(results['content'].items()):
print(" ", name)
print(" shape: ", output.numpy().shape)
print(" min: ", output.numpy().min())
print(" max: ", output.numpy().max())
print(" mean: ", output.numpy().mean())
```
```py
Styles:
block1_conv1
shape: (1, 64, 64)
min: 0.0055228462
max: 28014.562
mean: 263.79025
block2_conv1
shape: (1, 128, 128)
min: 0.0
max: 61479.49
mean: 9100.949
block3_conv1
shape: (1, 256, 256)
min: 0.0
max: 545623.44
mean: 7660.976
block4_conv1
shape: (1, 512, 512)
min: 0.0
max: 4320502.0
mean: 134288.84
block5_conv1
shape: (1, 512, 512)
min: 0.0
max: 110005.34
mean: 1487.0381
Contents:
block5_conv2
shape: (1, 26, 32, 512)
min: 0.0
max: 2410.8796
mean: 13.764149
```
## 梯度下降
使用此风格和内容提取器,我们现在可以实现风格传输算法。我们通过计算每个图像的输出和目标的均方误差来做到这一点,然后取这些损失值的加权和。
设置风格和内容的目标值:
```py
style_targets = extractor(style_image)['style']
content_targets = extractor(content_image)['content']
```
定义一个 [`tf.Variable`](https://tensorflow.google.cn/api_docs/python/tf/Variable) 来表示要优化的图像。 为了快速实现这一点,使用内容图像对其进行初始化( [`tf.Variable`](https://tensorflow.google.cn/api_docs/python/tf/Variable) 必须与内容图像的形状相同)
```py
image = tf.Variable(content_image)
```
由于这是一个浮点图像,因此我们定义一个函数来保持像素值在 0 和 1 之间:
```py
def clip_0_1(image):
return tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0)
```
创建一个 optimizer 。 本教程推荐 LBFGS`Adam` 也可以正常工作:
```py
opt = tf.optimizers.Adam(learning_rate=0.02, beta_1=0.99, epsilon=1e-1)
```
为了优化它,我们使用两个损失的加权组合来获得总损失:
```py
style_weight=1e-2
content_weight=1e4
```
```py
def style_content_loss(outputs):
style_outputs = outputs['style']
content_outputs = outputs['content']
style_loss = tf.add_n([tf.reduce_mean((style_outputs[name]-style_targets[name])**2)
for name in style_outputs.keys()])
style_loss *= style_weight / num_style_layers
content_loss = tf.add_n([tf.reduce_mean((content_outputs[name]-content_targets[name])**2)
for name in content_outputs.keys()])
content_loss *= content_weight / num_content_layers
loss = style_loss + content_loss
return loss
```
使用 [`tf.GradientTape`](https://tensorflow.google.cn/api_docs/python/tf/GradientTape) 来更新图像。
```py
@tf.function()
def train_step(image):
with tf.GradientTape() as tape:
outputs = extractor(image)
loss = style_content_loss(outputs)
grad = tape.gradient(loss, image)
opt.apply_gradients([(grad, image)])
image.assign(clip_0_1(image))
```
现在,我们运行几个步来测试一下:
```py
train_step(image)
train_step(image)
train_step(image)
tensor_to_image(image)
```
![png](img/643455194a29bfd2dc25c8821cbbf3b4.png)
运行正常,我们来执行一个更长的优化:
```py
import time
start = time.time()
epochs = 10
steps_per_epoch = 100
step = 0
for n in range(epochs):
for m in range(steps_per_epoch):
step += 1
train_step(image)
print(".", end='')
display.clear_output(wait=True)
display.display(tensor_to_image(image))
print("Train step: {}".format(step))
end = time.time()
print("Total time: {:.1f}".format(end-start))
```
![png](img/867e80eb383cce30a1f013a43e465d02.png)
```py
Train step: 1000
Total time: 20.4
```
## 总变分损失
此实现只是一个基础版本,它的一个缺点是它会产生大量的高频误差。 我们可以直接通过正则化图像的高频分量来减少这些高频误差。 在风格转移中,这通常被称为*总变分损失*
```py
def high_pass_x_y(image):
x_var = image[:,:,1:,:] - image[:,:,:-1,:]
y_var = image[:,1:,:,:] - image[:,:-1,:,:]
return x_var, y_var
```
```py
x_deltas, y_deltas = high_pass_x_y(content_image)
plt.figure(figsize=(14,10))
plt.subplot(2,2,1)
imshow(clip_0_1(2*y_deltas+0.5), "Horizontal Deltas: Original")
plt.subplot(2,2,2)
imshow(clip_0_1(2*x_deltas+0.5), "Vertical Deltas: Original")
x_deltas, y_deltas = high_pass_x_y(image)
plt.subplot(2,2,3)
imshow(clip_0_1(2*y_deltas+0.5), "Horizontal Deltas: Styled")
plt.subplot(2,2,4)
imshow(clip_0_1(2*x_deltas+0.5), "Vertical Deltas: Styled")
```
![png](img/e3d2caa770c7f600fb5cdc2a95ad0e0a.png)
这显示了高频分量如何增加。
而且,本质上高频分量是一个边缘检测器。 我们可以从 Sobel 边缘检测器获得类似的输出,例如:
```py
plt.figure(figsize=(14,10))
sobel = tf.image.sobel_edges(content_image)
plt.subplot(1,2,1)
imshow(clip_0_1(sobel[...,0]/4+0.5), "Horizontal Sobel-edges")
plt.subplot(1,2,2)
imshow(clip_0_1(sobel[...,1]/4+0.5), "Vertical Sobel-edges")
```
![png](img/03dad7eb5e1c97b1391c9925be7da416.png)
与此相关的正则化损失是这些值的平方和:
```py
def total_variation_loss(image):
x_deltas, y_deltas = high_pass_x_y(image)
return tf.reduce_sum(tf.abs(x_deltas)) + tf.reduce_sum(tf.abs(y_deltas))
```
```py
total_variation_loss(image).numpy()
```
```py
149342.6
```
以上说明了总变分损失的用途。但是无需自己实现,因为 TensorFlow 包含了一个标准实现:
```py
tf.image.total_variation(image).numpy()
```
```py
array([149342.6], dtype=float32)
```
## 重新进行优化
选择 `total_variation_loss` 的权重:
```py
total_variation_weight=30
```
现在,将它加入 `train_step` 函数中:
```py
@tf.function()
def train_step(image):
with tf.GradientTape() as tape:
outputs = extractor(image)
loss = style_content_loss(outputs)
loss += total_variation_weight*tf.image.total_variation(image)
grad = tape.gradient(loss, image)
opt.apply_gradients([(grad, image)])
image.assign(clip_0_1(image))
```
重新初始化优化的变量:
```py
image = tf.Variable(content_image)
```
并进行优化:
```py
import time
start = time.time()
epochs = 10
steps_per_epoch = 100
step = 0
for n in range(epochs):
for m in range(steps_per_epoch):
step += 1
train_step(image)
print(".", end='')
display.clear_output(wait=True)
display.display(tensor_to_image(image))
print("Train step: {}".format(step))
end = time.time()
print("Total time: {:.1f}".format(end-start))
```
![png](img/c67ce581d874e2d04e2761cc44b1d094.png)
```py
Train step: 1000
Total time: 21.7
```
最后,保存结果:
```py
file_name = 'stylized-image.png'
tensor_to_image(image).save(file_name)
try:
from google.colab import files
except ImportError:
pass
else:
files.download(file_name)
```

View File

@@ -1,373 +0,0 @@
# DeepDream
> 原文:[https://tensorflow.google.cn/tutorials/generative/deepdream](https://tensorflow.google.cn/tutorials/generative/deepdream)
This tutorial contains a minimal implementation of DeepDream, as described in this [blog post](https://ai.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html) by Alexander Mordvintsev.
DeepDream is an experiment that visualizes the patterns learned by a neural network. Similar to when a child watches clouds and tries to interpret random shapes, DeepDream over-interprets and enhances the patterns it sees in an image.
It does so by forwarding an image through the network, then calculating the gradient of the image with respect to the activations of a particular layer. The image is then modified to increase these activations, enhancing the patterns seen by the network, and resulting in a dream-like image. This process was dubbed "Inceptionism" (a reference to [InceptionNet](https://arxiv.org/pdf/1409.4842.pdf), and the [movie](https://en.wikipedia.org/wiki/Inception) Inception).
Let's demonstrate how you can make a neural network "dream" and enhance the surreal patterns it sees in an image.
![Dogception](img/ad462e5b3dc8d32430aaa7de7e4bf303.png)
```py
import tensorflow as tf
```
```py
import numpy as np
import matplotlib as mpl
import IPython.display as display
import PIL.Image
from tensorflow.keras.preprocessing import image
```
## Choose an image to dream-ify
For this tutorial, let's use an image of a [labrador](https://commons.wikimedia.org/wiki/File:YellowLabradorLooking_new.jpg).
```py
url = 'https://storage.googleapis.com/download.tensorflow.org/example_images/YellowLabradorLooking_new.jpg'
```
```py
# Download an image and read it into a NumPy array.
def download(url, max_dim=None):
name = url.split('/')[-1]
image_path = tf.keras.utils.get_file(name, origin=url)
img = PIL.Image.open(image_path)
if max_dim:
img.thumbnail((max_dim, max_dim))
return np.array(img)
# Normalize an image
def deprocess(img):
img = 255*(img + 1.0)/2.0
return tf.cast(img, tf.uint8)
# Display an image
def show(img):
display.display(PIL.Image.fromarray(np.array(img)))
# Downsizing the image makes it easier to work with.
original_img = download(url, max_dim=500)
show(original_img)
display.display(display.HTML('Image cc-by: <a "href=https://commons.wikimedia.org/wiki/File:Felis_catus-cat_on_snow.jpg">Von.grzanka</a>'))
```
```py
Downloading data from https://storage.googleapis.com/download.tensorflow.org/example_images/YellowLabradorLooking_new.jpg
90112/83281 [================================] - 0s 0us/step
```
![png](img/61002e329110c6cb1db1a82acd8d232f.png)
<devsite-iframe><iframe src="/tutorials/generative/deepdream_25faafe1baef788a2a4b21f9734926a3864096c5fe44122aa3752bb96ccd0445.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe>
## Prepare the feature extraction model
Download and prepare a pre-trained image classification model. You will use [InceptionV3](https://keras.io/applications/#inceptionv3) which is similar to the model originally used in DeepDream. Note that any [pre-trained model](https://keras.io/applications/#models-for-image-classification-with-weights-trained-on-imagenet) will work, although you will have to adjust the layer names below if you change this.
```py
base_model = tf.keras.applications.InceptionV3(include_top=False, weights='imagenet')
```
```py
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/inception_v3/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5
87916544/87910968 [==============================] - 2s 0us/step
```
The idea in DeepDream is to choose a layer (or layers) and maximize the "loss" in a way that the image increasingly "excites" the layers. The complexity of the features incorporated depends on layers chosen by you, i.e, lower layers produce strokes or simple patterns, while deeper layers give sophisticated features in images, or even whole objects.
The InceptionV3 architecture is quite large (for a graph of the model architecture see TensorFlow's [research repo](https://github.com/tensorflow/models/tree/master/research/inception)). For DeepDream, the layers of interest are those where the convolutions are concatenated. There are 11 of these layers in InceptionV3, named 'mixed0' though 'mixed10'. Using different layers will result in different dream-like images. Deeper layers respond to higher-level features (such as eyes and faces), while earlier layers respond to simpler features (such as edges, shapes, and textures). Feel free to experiment with the layers selected below, but keep in mind that deeper layers (those with a higher index) will take longer to train on since the gradient computation is deeper.
```py
# Maximize the activations of these layers
names = ['mixed3', 'mixed5']
layers = [base_model.get_layer(name).output for name in names]
# Create the feature extraction model
dream_model = tf.keras.Model(inputs=base_model.input, outputs=layers)
```
## Calculate loss
The loss is the sum of the activations in the chosen layers. The loss is normalized at each layer so the contribution from larger layers does not outweigh smaller layers. Normally, loss is a quantity you wish to minimize via gradient descent. In DeepDream, you will maximize this loss via gradient ascent.
```py
def calc_loss(img, model):
# Pass forward the image through the model to retrieve the activations.
# Converts the image into a batch of size 1.
img_batch = tf.expand_dims(img, axis=0)
layer_activations = model(img_batch)
if len(layer_activations) == 1:
layer_activations = [layer_activations]
losses = []
for act in layer_activations:
loss = tf.math.reduce_mean(act)
losses.append(loss)
return tf.reduce_sum(losses)
```
## Gradient ascent
Once you have calculated the loss for the chosen layers, all that is left is to calculate the gradients with respect to the image, and add them to the original image.
Adding the gradients to the image enhances the patterns seen by the network. At each step, you will have created an image that increasingly excites the activations of certain layers in the network.
The method that does this, below, is wrapped in a [`tf.function`](https://tensorflow.google.cn/api_docs/python/tf/function) for performance. It uses an `input_signature` to ensure that the function is not retraced for different image sizes or `steps`/`step_size` values. See the [Concrete functions guide](https://tensorflow.google.cn/guide/concrete_function) for details.
```py
class DeepDream(tf.Module):
def __init__(self, model):
self.model = model
@tf.function(
input_signature=(
tf.TensorSpec(shape=[None,None,3], dtype=tf.float32),
tf.TensorSpec(shape=[], dtype=tf.int32),
tf.TensorSpec(shape=[], dtype=tf.float32),)
)
def __call__(self, img, steps, step_size):
print("Tracing")
loss = tf.constant(0.0)
for n in tf.range(steps):
with tf.GradientTape() as tape:
# This needs gradients relative to `img`
# `GradientTape` only watches `tf.Variable`s by default
tape.watch(img)
loss = calc_loss(img, self.model)
# Calculate the gradient of the loss with respect to the pixels of the input image.
gradients = tape.gradient(loss, img)
# Normalize the gradients.
gradients /= tf.math.reduce_std(gradients) + 1e-8
# In gradient ascent, the "loss" is maximized so that the input image increasingly "excites" the layers.
# You can update the image by directly adding the gradients (because they're the same shape!)
img = img + gradients*step_size
img = tf.clip_by_value(img, -1, 1)
return loss, img
```
```py
deepdream = DeepDream(dream_model)
```
## Main Loop
```py
def run_deep_dream_simple(img, steps=100, step_size=0.01):
# Convert from uint8 to the range expected by the model.
img = tf.keras.applications.inception_v3.preprocess_input(img)
img = tf.convert_to_tensor(img)
step_size = tf.convert_to_tensor(step_size)
steps_remaining = steps
step = 0
while steps_remaining:
if steps_remaining>100:
run_steps = tf.constant(100)
else:
run_steps = tf.constant(steps_remaining)
steps_remaining -= run_steps
step += run_steps
loss, img = deepdream(img, run_steps, tf.constant(step_size))
display.clear_output(wait=True)
show(deprocess(img))
print ("Step {}, loss {}".format(step, loss))
result = deprocess(img)
display.clear_output(wait=True)
show(result)
return result
```
```py
dream_img = run_deep_dream_simple(img=original_img,
steps=100, step_size=0.01)
```
![png](img/e47b08aec7cc62d5268c6c6af8cf2b16.png)
## Taking it up an octave
Pretty good, but there are a few issues with this first attempt:
1. The output is noisy (this could be addressed with a [`tf.image.total_variation`](https://tensorflow.google.cn/api_docs/python/tf/image/total_variation) loss).
2. The image is low resolution.
3. The patterns appear like they're all happening at the same granularity.
One approach that addresses all these problems is applying gradient ascent at different scales. This will allow patterns generated at smaller scales to be incorporated into patterns at higher scales and filled in with additional detail.
To do this you can perform the previous gradient ascent approach, then increase the size of the image (which is referred to as an octave), and repeat this process for multiple octaves.
```py
import time
start = time.time()
OCTAVE_SCALE = 1.30
img = tf.constant(np.array(original_img))
base_shape = tf.shape(img)[:-1]
float_base_shape = tf.cast(base_shape, tf.float32)
for n in range(-2, 3):
new_shape = tf.cast(float_base_shape*(OCTAVE_SCALE**n), tf.int32)
img = tf.image.resize(img, new_shape).numpy()
img = run_deep_dream_simple(img=img, steps=50, step_size=0.01)
display.clear_output(wait=True)
img = tf.image.resize(img, base_shape)
img = tf.image.convert_image_dtype(img/255.0, dtype=tf.uint8)
show(img)
end = time.time()
end-start
```
![png](img/a3d4072cdd299fedb28dda8fdab7e611.png)
```py
5.535110235214233
```
## Optional: Scaling up with tiles
One thing to consider is that as the image increases in size, so will the time and memory necessary to perform the gradient calculation. The above octave implementation will not work on very large images, or many octaves.
To avoid this issue you can split the image into tiles and compute the gradient for each tile.
Applying random shifts to the image before each tiled computation prevents tile seams from appearing.
Start by implementing the random shift:
```py
def random_roll(img, maxroll):
# Randomly shift the image to avoid tiled boundaries.
shift = tf.random.uniform(shape=[2], minval=-maxroll, maxval=maxroll, dtype=tf.int32)
img_rolled = tf.roll(img, shift=shift, axis=[0,1])
return shift, img_rolled
```
```py
shift, img_rolled = random_roll(np.array(original_img), 512)
show(img_rolled)
```
![png](img/47c750cbb275e148fd8d76c4bf49d4a6.png)
Here is a tiled equivalent of the `deepdream` function defined earlier:
```py
class TiledGradients(tf.Module):
def __init__(self, model):
self.model = model
@tf.function(
input_signature=(
tf.TensorSpec(shape=[None,None,3], dtype=tf.float32),
tf.TensorSpec(shape=[], dtype=tf.int32),)
)
def __call__(self, img, tile_size=512):
shift, img_rolled = random_roll(img, tile_size)
# Initialize the image gradients to zero.
gradients = tf.zeros_like(img_rolled)
# Skip the last tile, unless there's only one tile.
xs = tf.range(0, img_rolled.shape[0], tile_size)[:-1]
if not tf.cast(len(xs), bool):
xs = tf.constant([0])
ys = tf.range(0, img_rolled.shape[1], tile_size)[:-1]
if not tf.cast(len(ys), bool):
ys = tf.constant([0])
for x in xs:
for y in ys:
# Calculate the gradients for this tile.
with tf.GradientTape() as tape:
# This needs gradients relative to `img_rolled`.
# `GradientTape` only watches `tf.Variable`s by default.
tape.watch(img_rolled)
# Extract a tile out of the image.
img_tile = img_rolled[x:x+tile_size, y:y+tile_size]
loss = calc_loss(img_tile, self.model)
# Update the image gradients for this tile.
gradients = gradients + tape.gradient(loss, img_rolled)
# Undo the random shift applied to the image and its gradients.
gradients = tf.roll(gradients, shift=-shift, axis=[0,1])
# Normalize the gradients.
gradients /= tf.math.reduce_std(gradients) + 1e-8
return gradients
```
```py
get_tiled_gradients = TiledGradients(dream_model)
```
Putting this together gives a scalable, octave-aware deepdream implementation:
```py
def run_deep_dream_with_octaves(img, steps_per_octave=100, step_size=0.01,
octaves=range(-2,3), octave_scale=1.3):
base_shape = tf.shape(img)
img = tf.keras.preprocessing.image.img_to_array(img)
img = tf.keras.applications.inception_v3.preprocess_input(img)
initial_shape = img.shape[:-1]
img = tf.image.resize(img, initial_shape)
for octave in octaves:
# Scale the image based on the octave
new_size = tf.cast(tf.convert_to_tensor(base_shape[:-1]), tf.float32)*(octave_scale**octave)
img = tf.image.resize(img, tf.cast(new_size, tf.int32))
for step in range(steps_per_octave):
gradients = get_tiled_gradients(img)
img = img + gradients*step_size
img = tf.clip_by_value(img, -1, 1)
if step % 10 == 0:
display.clear_output(wait=True)
show(deprocess(img))
print ("Octave {}, Step {}".format(octave, step))
result = deprocess(img)
return result
```
```py
img = run_deep_dream_with_octaves(img=original_img, step_size=0.01)
display.clear_output(wait=True)
img = tf.image.resize(img, base_shape)
img = tf.image.convert_image_dtype(img/255.0, dtype=tf.uint8)
show(img)
```
![png](img/1c3bc0a947aefadc9c04f9c5f0bf1991.png)
Much better! Play around with the number of octaves, octave scale, and activated layers to change how your DeepDream-ed image looks.
Readers might also be interested in [TensorFlow Lucid](https://github.com/tensorflow/lucid) which expands on ideas introduced in this tutorial to visualize and interpret neural networks.

View File

@@ -1,389 +0,0 @@
# 深度卷积生成对抗网络
> 原文:[https://tensorflow.google.cn/tutorials/generative/dcgan](https://tensorflow.google.cn/tutorials/generative/dcgan)
**Note:** 我们的 TensorFlow 社区翻译了这些文档。因为社区翻译是尽力而为, 所以无法保证它们是最准确的,并且反映了最新的 [官方英文文档](https://tensorflow.google.cn/?hl=en)。如果您有改进此翻译的建议, 请提交 pull request 到 [tensorflow/docs](https://github.com/tensorflow/docs) GitHub 仓库。要志愿地撰写或者审核译文,请加入 [docs-zh-cn@tensorflow.org Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs-zh-cn)。
本教程演示了如何使用[深度卷积生成对抗网络](https://arxiv.org/pdf/1511.06434.pdf)DCGAN生成手写数字图片。该代码是使用 [Keras Sequential API](https://tensorflow.google.cn/guide/keras) 与 [`tf.GradientTape`](https://tensorflow.google.cn/api_docs/python/tf/GradientTape) 训练循环编写的。
## 什么是生成对抗网络?
[生成对抗网络](https://arxiv.org/abs/1406.2661)GANs是当今计算机科学领域最有趣的想法之一。两个模型通过对抗过程同时训练。一个*生成器*(“艺术家”)学习创造看起来真实的图像,而*判别器*(“艺术评论家”)学习区分真假图像。
![生成器和判别器图示](img/d6513785291f1616fa5a88b830c9a438.png)
训练过程中,*生成器*在生成逼真图像方面逐渐变强,而*判别器*在辨别这些图像的能力上逐渐变强。当*判别器*不再能够区分真实图片和伪造图片时,训练过程达到平衡。
![生成器和判别器图示二](img/a84da0fdd95c0b8365360f941f57e017.png)
本笔记在 MNIST 数据集上演示了该过程。下方动画展示了当训练了 50 个 epoch (全部数据集迭代 50 次) 时*生成器*所生成的一系列图片。图片从随机噪声开始,随着时间的推移越来越像手写数字。
![输出样本](img/2e071a2b770d50ed5ef40dabbe1fd1a7.png)
要了解关于 GANs 的更多信息,我们建议参阅 MIT 的 [深度学习入门](http://introtodeeplearning.com/) 课程。
### Import TensorFlow and other libraries
```py
import tensorflow as tf
```
```py
tf.__version__
```
```py
'2.3.0'
```
```py
# 用于生成 GIF 图片
pip install -q imageio
```
```py
WARNING: You are using pip version 20.2.2; however, version 20.2.3 is available.
You should consider upgrading via the '/tmpfs/src/tf_docs_env/bin/python -m pip install --upgrade pip' command.
```
```py
import glob
import imageio
import matplotlib.pyplot as plt
import numpy as np
import os
import PIL
from tensorflow.keras import layers
import time
from IPython import display
```
### 加载和准备数据集
您将使用 MNIST 数据集来训练生成器和判别器。生成器将生成类似于 MNIST 数据集的手写数字。
```py
(train_images, train_labels), (_, _) = tf.keras.datasets.mnist.load_data()
```
```py
train_images = train_images.reshape(train_images.shape[0], 28, 28, 1).astype('float32')
train_images = (train_images - 127.5) / 127.5 # 将图片标准化到 [-1, 1] 区间内
```
```py
BUFFER_SIZE = 60000
BATCH_SIZE = 256
```
```py
# 批量化和打乱数据
train_dataset = tf.data.Dataset.from_tensor_slices(train_images).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)
```
## 创建模型
生成器和判别器均使用 [Keras Sequential API](https://tensorflow.google.cn/guide/keras#sequential_model) 定义。
### 生成器
生成器使用 [`tf.keras.layers.Conv2DTranspose`](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/Conv2DTranspose) (上采样)层来从种子(随机噪声)中产生图片。以一个使用该种子作为输入的 `Dense` 层开始,然后多次上采样直到达到所期望的 28x28x1 的图片尺寸。注意除了输出层使用 tanh 之外,其他每层均使用 [`tf.keras.layers.LeakyReLU`](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/LeakyReLU) 作为激活函数。
```py
def make_generator_model():
model = tf.keras.Sequential()
model.add(layers.Dense(7*7*256, use_bias=False, input_shape=(100,)))
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
model.add(layers.Reshape((7, 7, 256)))
assert model.output_shape == (None, 7, 7, 256) # 注意batch size 没有限制
model.add(layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding='same', use_bias=False))
assert model.output_shape == (None, 7, 7, 128)
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
model.add(layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False))
assert model.output_shape == (None, 14, 14, 64)
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
model.add(layers.Conv2DTranspose(1, (5, 5), strides=(2, 2), padding='same', use_bias=False, activation='tanh'))
assert model.output_shape == (None, 28, 28, 1)
return model
```
使用(尚未训练的)生成器创建一张图片。
```py
generator = make_generator_model()
noise = tf.random.normal([1, 100])
generated_image = generator(noise, training=False)
plt.imshow(generated_image[0, :, :, 0], cmap='gray')
```
```py
<matplotlib.image.AxesImage at 0x7f01d26074a8>
```
![png](img/22f7bd226b742292050c368b980067f4.png)
### 判别器
判别器是一个基于 CNN 的图片分类器。
```py
def make_discriminator_model():
model = tf.keras.Sequential()
model.add(layers.Conv2D(64, (5, 5), strides=(2, 2), padding='same',
input_shape=[28, 28, 1]))
model.add(layers.LeakyReLU())
model.add(layers.Dropout(0.3))
model.add(layers.Conv2D(128, (5, 5), strides=(2, 2), padding='same'))
model.add(layers.LeakyReLU())
model.add(layers.Dropout(0.3))
model.add(layers.Flatten())
model.add(layers.Dense(1))
return model
```
使用(尚未训练的)判别器来对图片的真伪进行判断。模型将被训练为为真实图片输出正值,为伪造图片输出负值。
```py
discriminator = make_discriminator_model()
decision = discriminator(generated_image)
print (decision)
```
```py
tf.Tensor([[-0.00427552]], shape=(1, 1), dtype=float32)
```
## 定义损失函数和优化器
为两个模型定义损失函数和优化器。
```py
# 该方法返回计算交叉熵损失的辅助函数
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)
```
### 判别器损失
该方法量化判别器从判断真伪图片的能力。它将判别器对真实图片的预测值与值全为 1 的数组进行对比,将判别器对伪造(生成的)图片的预测值与值全为 0 的数组进行对比。
```py
def discriminator_loss(real_output, fake_output):
real_loss = cross_entropy(tf.ones_like(real_output), real_output)
fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
total_loss = real_loss + fake_loss
return total_loss
```
### 生成器损失
生成器损失量化其欺骗判别器的能力。直观来讲,如果生成器表现良好,判别器将会把伪造图片判断为真实图片(或 1。这里我们将把判别器在生成图片上的判断结果与一个值全为 1 的数组进行对比。
```py
def generator_loss(fake_output):
return cross_entropy(tf.ones_like(fake_output), fake_output)
```
由于我们需要分别训练两个网络,判别器和生成器的优化器是不同的。
```py
generator_optimizer = tf.keras.optimizers.Adam(1e-4)
discriminator_optimizer = tf.keras.optimizers.Adam(1e-4)
```
### 保存检查点
本笔记还演示了如何保存和恢复模型,这在长时间训练任务被中断的情况下比较有帮助。
```py
checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
checkpoint = tf.train.Checkpoint(generator_optimizer=generator_optimizer,
discriminator_optimizer=discriminator_optimizer,
generator=generator,
discriminator=discriminator)
```
## 定义训练循环
```py
EPOCHS = 50
noise_dim = 100
num_examples_to_generate = 16
# 我们将重复使用该种子(因此在动画 GIF 中更容易可视化进度)
seed = tf.random.normal([num_examples_to_generate, noise_dim])
```
训练循环在生成器接收到一个随机种子作为输入时开始。该种子用于生产一张图片。判别器随后被用于区分真实图片(选自训练集)和伪造图片(由生成器生成)。针对这里的每一个模型都计算损失函数,并且计算梯度用于更新生成器与判别器。
```py
# 注意 `tf.function` 的使用
# 该注解使函数被“编译”
@tf.function
def train_step(images):
noise = tf.random.normal([BATCH_SIZE, noise_dim])
with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
generated_images = generator(noise, training=True)
real_output = discriminator(images, training=True)
fake_output = discriminator(generated_images, training=True)
gen_loss = generator_loss(fake_output)
disc_loss = discriminator_loss(real_output, fake_output)
gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)
generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))
```
```py
def train(dataset, epochs):
for epoch in range(epochs):
start = time.time()
for image_batch in dataset:
train_step(image_batch)
# 继续进行时为 GIF 生成图像
display.clear_output(wait=True)
generate_and_save_images(generator,
epoch + 1,
seed)
# 每 15 个 epoch 保存一次模型
if (epoch + 1) % 15 == 0:
checkpoint.save(file_prefix = checkpoint_prefix)
print ('Time for epoch {} is {} sec'.format(epoch + 1, time.time()-start))
# 最后一个 epoch 结束后生成图片
display.clear_output(wait=True)
generate_and_save_images(generator,
epochs,
seed)
```
**生成与保存图片**
```py
def generate_and_save_images(model, epoch, test_input):
# 注意 training` 设定为 False
# 因此所有层都在推理模式下运行batchnorm
predictions = model(test_input, training=False)
fig = plt.figure(figsize=(4,4))
for i in range(predictions.shape[0]):
plt.subplot(4, 4, i+1)
plt.imshow(predictions[i, :, :, 0] * 127.5 + 127.5, cmap='gray')
plt.axis('off')
plt.savefig('image_at_epoch_{:04d}.png'.format(epoch))
plt.show()
```
## 训练模型
调用上面定义的 `train()` 方法来同时训练生成器和判别器。注意,训练 GANs 可能是棘手的。重要的是,生成器和判别器不能够互相压制对方(例如,他们以相似的学习率训练)。
在训练之初,生成的图片看起来像是随机噪声。随着训练过程的进行,生成的数字将越来越真实。在大概 50 个 epoch 之后,这些图片看起来像是 MNIST 数字。使用 Colab 中的默认设置可能需要大约 1 分钟每 epoch。
```py
%%time
train(train_dataset, EPOCHS)
```
![png](img/f3c5a66b35a03bd6a2bf9c3a65a39dfb.png)
```py
CPU times: user 1min 52s, sys: 11.7 s, total: 2min 4s
Wall time: 3min 22s
```
恢复最新的检查点。
```py
checkpoint.restore(tf.train.latest_checkpoint(checkpoint_dir))
```
```py
<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7f0118537668>
```
## 创建 GIF
```py
# 使用 epoch 数生成单张图片
def display_image(epoch_no):
return PIL.Image.open('image_at_epoch_{:04d}.png'.format(epoch_no))
```
```py
display_image(EPOCHS)
```
![png](img/c12f3797e75b6aa8bdc206f4b91344c1.png)
使用训练过程中生成的图片通过 `imageio` 生成动态 gif
```py
anim_file = 'dcgan.gif'
with imageio.get_writer(anim_file, mode='I') as writer:
filenames = glob.glob('image*.png')
filenames = sorted(filenames)
last = -1
for i,filename in enumerate(filenames):
frame = 2*(i**0.5)
if round(frame) > round(last):
last = frame
else:
continue
image = imageio.imread(filename)
writer.append_data(image)
image = imageio.imread(filename)
writer.append_data(image)
import IPython
if IPython.version_info > (6,2,0,''):
display.Image(filename=anim_file)
```
如果您正在使用 Colab您可以通过如下代码下载动画
```py
try:
from google.colab import files
except ImportError:
pass
else:
files.download(anim_file)
```
## 下一步
本教程展示了实现和训练 GAN 模型所需的全部必要代码。接下来您可能想尝试其他数据集例如大规模名人面部属性CelebA数据集 [在 Kaggle 上获取](https://www.kaggle.com/jessicali9530/celeba-dataset)。要了解更多关于 GANs 的信息,我们推荐参阅 [NIPS 2016 教程: 生成对抗网络](https://arxiv.org/abs/1701.00160)。

View File

@@ -1,703 +0,0 @@
# Pix2Pix
> 原文:[https://tensorflow.google.cn/tutorials/generative/pix2pix](https://tensorflow.google.cn/tutorials/generative/pix2pix)
This notebook demonstrates image to image translation using conditional GAN's, as described in [Image-to-Image Translation with Conditional Adversarial Networks](https://arxiv.org/abs/1611.07004). Using this technique we can colorize black and white photos, convert google maps to google earth, etc. Here, we convert building facades to real buildings.
In example, we will use the [CMP Facade Database](http://cmp.felk.cvut.cz/%7Etylecr1/facade/), helpfully provided by the [Center for Machine Perception](http://cmp.felk.cvut.cz/) at the [Czech Technical University in Prague](https://www.cvut.cz/). To keep our example short, we will use a preprocessed [copy](https://people.eecs.berkeley.edu/%7Etinghuiz/projects/pix2pix/datasets/) of this dataset, created by the authors of the [paper](https://arxiv.org/abs/1611.07004) above.
Each epoch takes around 15 seconds on a single V100 GPU.
Below is the output generated after training the model for 200 epochs.
![sample output_1](img/e297781397cdc97e304b45625f7ae423.png) ![sample output_2](img/7f05b53be9225270c3955654d7d465de.png)
## Import TensorFlow and other libraries
```py
import tensorflow as tf
import os
import time
from matplotlib import pyplot as plt
from IPython import display
```
```py
pip install -q -U tensorboard
```
```py
WARNING: You are using pip version 20.2.2; however, version 20.2.3 is available.
You should consider upgrading via the '/tmpfs/src/tf_docs_env/bin/python -m pip install --upgrade pip' command.
```
## Load the dataset
You can download this dataset and similar datasets from [here](https://people.eecs.berkeley.edu/%7Etinghuiz/projects/pix2pix/datasets). As mentioned in the [paper](https://arxiv.org/abs/1611.07004) we apply random jittering and mirroring to the training dataset.
* In random jittering, the image is resized to `286 x 286` and then randomly cropped to `256 x 256`
* In random mirroring, the image is randomly flipped horizontally i.e left to right.
```py
_URL = 'https://people.eecs.berkeley.edu/~tinghuiz/projects/pix2pix/datasets/facades.tar.gz'
path_to_zip = tf.keras.utils.get_file('facades.tar.gz',
origin=_URL,
extract=True)
PATH = os.path.join(os.path.dirname(path_to_zip), 'facades/')
```
```py
Downloading data from https://people.eecs.berkeley.edu/~tinghuiz/projects/pix2pix/datasets/facades.tar.gz
30171136/30168306 [==============================] - 2s 0us/step
```
```py
BUFFER_SIZE = 400
BATCH_SIZE = 1
IMG_WIDTH = 256
IMG_HEIGHT = 256
```
```py
def load(image_file):
image = tf.io.read_file(image_file)
image = tf.image.decode_jpeg(image)
w = tf.shape(image)[1]
w = w // 2
real_image = image[:, :w, :]
input_image = image[:, w:, :]
input_image = tf.cast(input_image, tf.float32)
real_image = tf.cast(real_image, tf.float32)
return input_image, real_image
```
```py
inp, re = load(PATH+'train/100.jpg')
# casting to int for matplotlib to show the image
plt.figure()
plt.imshow(inp/255.0)
plt.figure()
plt.imshow(re/255.0)
```
```py
<matplotlib.image.AxesImage at 0x7f5576b28550>
```
![png](img/52194b6e27c77c651d0f3c56066448f5.png)
![png](img/ab876a0a7878b27ea0658f95d96f1ddb.png)
```py
def resize(input_image, real_image, height, width):
input_image = tf.image.resize(input_image, [height, width],
method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
real_image = tf.image.resize(real_image, [height, width],
method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
return input_image, real_image
```
```py
def random_crop(input_image, real_image):
stacked_image = tf.stack([input_image, real_image], axis=0)
cropped_image = tf.image.random_crop(
stacked_image, size=[2, IMG_HEIGHT, IMG_WIDTH, 3])
return cropped_image[0], cropped_image[1]
```
```py
# normalizing the images to [-1, 1]
def normalize(input_image, real_image):
input_image = (input_image / 127.5) - 1
real_image = (real_image / 127.5) - 1
return input_image, real_image
```
```py
@tf.function()
def random_jitter(input_image, real_image):
# resizing to 286 x 286 x 3
input_image, real_image = resize(input_image, real_image, 286, 286)
# randomly cropping to 256 x 256 x 3
input_image, real_image = random_crop(input_image, real_image)
if tf.random.uniform(()) > 0.5:
# random mirroring
input_image = tf.image.flip_left_right(input_image)
real_image = tf.image.flip_left_right(real_image)
return input_image, real_image
```
As you can see in the images below that they are going through random jittering Random jittering as described in the paper is to
1. Resize an image to bigger height and width
2. Randomly crop to the target size
3. Randomly flip the image horizontally
```py
plt.figure(figsize=(6, 6))
for i in range(4):
rj_inp, rj_re = random_jitter(inp, re)
plt.subplot(2, 2, i+1)
plt.imshow(rj_inp/255.0)
plt.axis('off')
plt.show()
```
![png](img/be737507a3c4409c7dc8aa33d2196e15.png)
```py
def load_image_train(image_file):
input_image, real_image = load(image_file)
input_image, real_image = random_jitter(input_image, real_image)
input_image, real_image = normalize(input_image, real_image)
return input_image, real_image
```
```py
def load_image_test(image_file):
input_image, real_image = load(image_file)
input_image, real_image = resize(input_image, real_image,
IMG_HEIGHT, IMG_WIDTH)
input_image, real_image = normalize(input_image, real_image)
return input_image, real_image
```
## Input Pipeline
```py
train_dataset = tf.data.Dataset.list_files(PATH+'train/*.jpg')
train_dataset = train_dataset.map(load_image_train,
num_parallel_calls=tf.data.experimental.AUTOTUNE)
train_dataset = train_dataset.shuffle(BUFFER_SIZE)
train_dataset = train_dataset.batch(BATCH_SIZE)
```
```py
test_dataset = tf.data.Dataset.list_files(PATH+'test/*.jpg')
test_dataset = test_dataset.map(load_image_test)
test_dataset = test_dataset.batch(BATCH_SIZE)
```
## Build the Generator
* The architecture of generator is a modified U-Net.
* Each block in the encoder is (Conv -> Batchnorm -> Leaky ReLU)
* Each block in the decoder is (Transposed Conv -> Batchnorm -> Dropout(applied to the first 3 blocks) -> ReLU)
* There are skip connections between the encoder and decoder (as in U-Net).
```py
OUTPUT_CHANNELS = 3
```
```py
def downsample(filters, size, apply_batchnorm=True):
initializer = tf.random_normal_initializer(0., 0.02)
result = tf.keras.Sequential()
result.add(
tf.keras.layers.Conv2D(filters, size, strides=2, padding='same',
kernel_initializer=initializer, use_bias=False))
if apply_batchnorm:
result.add(tf.keras.layers.BatchNormalization())
result.add(tf.keras.layers.LeakyReLU())
return result
```
```py
down_model = downsample(3, 4)
down_result = down_model(tf.expand_dims(inp, 0))
print (down_result.shape)
```
```py
(1, 128, 128, 3)
```
```py
def upsample(filters, size, apply_dropout=False):
initializer = tf.random_normal_initializer(0., 0.02)
result = tf.keras.Sequential()
result.add(
tf.keras.layers.Conv2DTranspose(filters, size, strides=2,
padding='same',
kernel_initializer=initializer,
use_bias=False))
result.add(tf.keras.layers.BatchNormalization())
if apply_dropout:
result.add(tf.keras.layers.Dropout(0.5))
result.add(tf.keras.layers.ReLU())
return result
```
```py
up_model = upsample(3, 4)
up_result = up_model(down_result)
print (up_result.shape)
```
```py
(1, 256, 256, 3)
```
```py
def Generator():
inputs = tf.keras.layers.Input(shape=[256,256,3])
down_stack = [
downsample(64, 4, apply_batchnorm=False), # (bs, 128, 128, 64)
downsample(128, 4), # (bs, 64, 64, 128)
downsample(256, 4), # (bs, 32, 32, 256)
downsample(512, 4), # (bs, 16, 16, 512)
downsample(512, 4), # (bs, 8, 8, 512)
downsample(512, 4), # (bs, 4, 4, 512)
downsample(512, 4), # (bs, 2, 2, 512)
downsample(512, 4), # (bs, 1, 1, 512)
]
up_stack = [
upsample(512, 4, apply_dropout=True), # (bs, 2, 2, 1024)
upsample(512, 4, apply_dropout=True), # (bs, 4, 4, 1024)
upsample(512, 4, apply_dropout=True), # (bs, 8, 8, 1024)
upsample(512, 4), # (bs, 16, 16, 1024)
upsample(256, 4), # (bs, 32, 32, 512)
upsample(128, 4), # (bs, 64, 64, 256)
upsample(64, 4), # (bs, 128, 128, 128)
]
initializer = tf.random_normal_initializer(0., 0.02)
last = tf.keras.layers.Conv2DTranspose(OUTPUT_CHANNELS, 4,
strides=2,
padding='same',
kernel_initializer=initializer,
activation='tanh') # (bs, 256, 256, 3)
x = inputs
# Downsampling through the model
skips = []
for down in down_stack:
x = down(x)
skips.append(x)
skips = reversed(skips[:-1])
# Upsampling and establishing the skip connections
for up, skip in zip(up_stack, skips):
x = up(x)
x = tf.keras.layers.Concatenate()([x, skip])
x = last(x)
return tf.keras.Model(inputs=inputs, outputs=x)
```
```py
generator = Generator()
tf.keras.utils.plot_model(generator, show_shapes=True, dpi=64)
```
![png](img/027fe3c7c1b2c8f4ba851311692e3d91.png)
```py
gen_output = generator(inp[tf.newaxis,...], training=False)
plt.imshow(gen_output[0,...])
```
```py
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
<matplotlib.image.AxesImage at 0x7f54c85167b8>
```
![png](img/e4d27c794147e0649dec40c1e673fa3d.png)
* **Generator loss**
* It is a sigmoid cross entropy loss of the generated images and an **array of ones**.
* The [paper](https://arxiv.org/abs/1611.07004) also includes L1 loss which is MAE (mean absolute error) between the generated image and the target image.
* This allows the generated image to become structurally similar to the target image.
* The formula to calculate the total generator loss = gan_loss + LAMBDA * l1_loss, where LAMBDA = 100\. This value was decided by the authors of the [paper](https://arxiv.org/abs/1611.07004).
The training procedure for the generator is shown below:
```py
LAMBDA = 100
```
```py
def generator_loss(disc_generated_output, gen_output, target):
gan_loss = loss_object(tf.ones_like(disc_generated_output), disc_generated_output)
# mean absolute error
l1_loss = tf.reduce_mean(tf.abs(target - gen_output))
total_gen_loss = gan_loss + (LAMBDA * l1_loss)
return total_gen_loss, gan_loss, l1_loss
```
![Generator Update Image](img/b7fd03ac59129ba2515cf59b292f3296.png)
## Build the Discriminator
* The Discriminator is a PatchGAN.
* Each block in the discriminator is (Conv -> BatchNorm -> Leaky ReLU)
* The shape of the output after the last layer is (batch_size, 30, 30, 1)
* Each 30x30 patch of the output classifies a 70x70 portion of the input image (such an architecture is called a PatchGAN).
* Discriminator receives 2 inputs.
* Input image and the target image, which it should classify as real.
* Input image and the generated image (output of generator), which it should classify as fake.
* We concatenate these 2 inputs together in the code (`tf.concat([inp, tar], axis=-1)`)
```py
def Discriminator():
initializer = tf.random_normal_initializer(0., 0.02)
inp = tf.keras.layers.Input(shape=[256, 256, 3], name='input_image')
tar = tf.keras.layers.Input(shape=[256, 256, 3], name='target_image')
x = tf.keras.layers.concatenate([inp, tar]) # (bs, 256, 256, channels*2)
down1 = downsample(64, 4, False)(x) # (bs, 128, 128, 64)
down2 = downsample(128, 4)(down1) # (bs, 64, 64, 128)
down3 = downsample(256, 4)(down2) # (bs, 32, 32, 256)
zero_pad1 = tf.keras.layers.ZeroPadding2D()(down3) # (bs, 34, 34, 256)
conv = tf.keras.layers.Conv2D(512, 4, strides=1,
kernel_initializer=initializer,
use_bias=False)(zero_pad1) # (bs, 31, 31, 512)
batchnorm1 = tf.keras.layers.BatchNormalization()(conv)
leaky_relu = tf.keras.layers.LeakyReLU()(batchnorm1)
zero_pad2 = tf.keras.layers.ZeroPadding2D()(leaky_relu) # (bs, 33, 33, 512)
last = tf.keras.layers.Conv2D(1, 4, strides=1,
kernel_initializer=initializer)(zero_pad2) # (bs, 30, 30, 1)
return tf.keras.Model(inputs=[inp, tar], outputs=last)
```
```py
discriminator = Discriminator()
tf.keras.utils.plot_model(discriminator, show_shapes=True, dpi=64)
```
![png](img/0425284f7bd595a686480abe82721a04.png)
```py
disc_out = discriminator([inp[tf.newaxis,...], gen_output], training=False)
plt.imshow(disc_out[0,...,-1], vmin=-20, vmax=20, cmap='RdBu_r')
plt.colorbar()
```
```py
<matplotlib.colorbar.Colorbar at 0x7f54c83a3fd0>
```
![png](img/644c999529792fb810f213e660e582b8.png)
**Discriminator loss**
* The discriminator loss function takes 2 inputs; **real images, generated images**
* real_loss is a sigmoid cross entropy loss of the **real images** and an **array of ones(since these are the real images)**
* generated_loss is a sigmoid cross entropy loss of the **generated images** and an **array of zeros(since these are the fake images)**
* Then the total_loss is the sum of real_loss and the generated_loss
```py
loss_object = tf.keras.losses.BinaryCrossentropy(from_logits=True)
```
```py
def discriminator_loss(disc_real_output, disc_generated_output):
real_loss = loss_object(tf.ones_like(disc_real_output), disc_real_output)
generated_loss = loss_object(tf.zeros_like(disc_generated_output), disc_generated_output)
total_disc_loss = real_loss + generated_loss
return total_disc_loss
```
The training procedure for the discriminator is shown below.
To learn more about the architecture and the hyperparameters you can refer the [paper](https://arxiv.org/abs/1611.07004).
![Discriminator Update Image](img/a49dab0e9e9ab0a58b2928fb2760dab6.png)
## Define the Optimizers and Checkpoint-saver
```py
generator_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
discriminator_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
```
```py
checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
checkpoint = tf.train.Checkpoint(generator_optimizer=generator_optimizer,
discriminator_optimizer=discriminator_optimizer,
generator=generator,
discriminator=discriminator)
```
## Generate Images
Write a function to plot some images during training.
* We pass images from the test dataset to the generator.
* The generator will then translate the input image into the output.
* Last step is to plot the predictions and **voila!**
**Note:** The `training=True` is intentional here since we want the batch statistics while running the model on the test dataset. If we use training=False, we will get the accumulated statistics learned from the training dataset (which we don't want)
```py
def generate_images(model, test_input, tar):
prediction = model(test_input, training=True)
plt.figure(figsize=(15,15))
display_list = [test_input[0], tar[0], prediction[0]]
title = ['Input Image', 'Ground Truth', 'Predicted Image']
for i in range(3):
plt.subplot(1, 3, i+1)
plt.title(title[i])
# getting the pixel values between [0, 1] to plot it.
plt.imshow(display_list[i] * 0.5 + 0.5)
plt.axis('off')
plt.show()
```
```py
for example_input, example_target in test_dataset.take(1):
generate_images(generator, example_input, example_target)
```
![png](img/a2d79e6f20ade2372271c76afeaca800.png)
## Training
* For each example input generate an output.
* The discriminator receives the input_image and the generated image as the first input. The second input is the input_image and the target_image.
* Next, we calculate the generator and the discriminator loss.
* Then, we calculate the gradients of loss with respect to both the generator and the discriminator variables(inputs) and apply those to the optimizer.
* Then log the losses to TensorBoard.
```py
EPOCHS = 150
```
```py
import datetime
log_dir="logs/"
summary_writer = tf.summary.create_file_writer(
log_dir + "fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
```
```py
@tf.function
def train_step(input_image, target, epoch):
with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
gen_output = generator(input_image, training=True)
disc_real_output = discriminator([input_image, target], training=True)
disc_generated_output = discriminator([input_image, gen_output], training=True)
gen_total_loss, gen_gan_loss, gen_l1_loss = generator_loss(disc_generated_output, gen_output, target)
disc_loss = discriminator_loss(disc_real_output, disc_generated_output)
generator_gradients = gen_tape.gradient(gen_total_loss,
generator.trainable_variables)
discriminator_gradients = disc_tape.gradient(disc_loss,
discriminator.trainable_variables)
generator_optimizer.apply_gradients(zip(generator_gradients,
generator.trainable_variables))
discriminator_optimizer.apply_gradients(zip(discriminator_gradients,
discriminator.trainable_variables))
with summary_writer.as_default():
tf.summary.scalar('gen_total_loss', gen_total_loss, step=epoch)
tf.summary.scalar('gen_gan_loss', gen_gan_loss, step=epoch)
tf.summary.scalar('gen_l1_loss', gen_l1_loss, step=epoch)
tf.summary.scalar('disc_loss', disc_loss, step=epoch)
```
The actual training loop:
* Iterates over the number of epochs.
* On each epoch it clears the display, and runs `generate_images` to show it's progress.
* On each epoch it iterates over the training dataset, printing a '.' for each example.
* It saves a checkpoint every 20 epochs.
```py
def fit(train_ds, epochs, test_ds):
for epoch in range(epochs):
start = time.time()
display.clear_output(wait=True)
for example_input, example_target in test_ds.take(1):
generate_images(generator, example_input, example_target)
print("Epoch: ", epoch)
# Train
for n, (input_image, target) in train_ds.enumerate():
print('.', end='')
if (n+1) % 100 == 0:
print()
train_step(input_image, target, epoch)
print()
# saving (checkpoint) the model every 20 epochs
if (epoch + 1) % 20 == 0:
checkpoint.save(file_prefix = checkpoint_prefix)
print ('Time taken for epoch {} is {} sec\n'.format(epoch + 1,
time.time()-start))
checkpoint.save(file_prefix = checkpoint_prefix)
```
This training loop saves logs you can easily view in TensorBoard to monitor the training progress. Working locally you would launch a separate tensorboard process. In a notebook, if you want to monitor with TensorBoard it's easiest to launch the viewer before starting the training.
To launch the viewer paste the following into a code-cell:
```py
%load_ext tensorboard
%tensorboard --logdir {log_dir}
```
Now run the training loop:
```py
fit(train_dataset, EPOCHS, test_dataset)
```
![png](img/4c8ef6a2c8f0548a9f5bb182b8d3de01.png)
```py
Epoch: 149
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
Time taken for epoch 150 is 16.14578342437744 sec
```
If you want to share the TensorBoard results *publicly* you can upload the logs to [TensorBoard.dev](https://tensorboard.dev/) by copying the following into a code-cell.
**Note:** This requires a Google account.
```py
tensorboard dev upload --logdir {log_dir}
```
**Caution:** This command does not terminate. It's designed to continuously upload the results of long-running experiments. Once your data is uploaded you need to stop it using the "interrupt execution" option in your notebook tool.
You can view the [results of a previous run](https://tensorboard.dev/experiment/lZ0C6FONROaUMfjYkVyJqw) of this notebook on [TensorBoard.dev](https://tensorboard.dev/).
TensorBoard.dev is a managed experience for hosting, tracking, and sharing ML experiments with everyone.
It can also included inline using an `<iframe>`:
```py
display.IFrame(
src="https://tensorboard.dev/experiment/lZ0C6FONROaUMfjYkVyJqw",
width="100%",
height="1000px")
```
<devsite-iframe><iframe src="/tutorials/generative/pix2pix_528ecc0a7230cf0eefd54a1c1b455500df0787fc66f9b1de7498d3e87694f029.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe>
Interpreting the logs from a GAN is more subtle than a simple classification or regression model. Things to look for::
* Check that neither model has "won". If either the `gen_gan_loss` or the `disc_loss` gets very low it's an indicator that this model is dominating the other, and you are not successfully training the combined model.
* The value `log(2) = 0.69` is a good reference point for these losses, as it indicates a perplexity of 2: That the discriminator is on average equally uncertain about the two options.
* For the `disc_loss` a value below `0.69` means the discriminator is doing better than random, on the combined set of real+generated images.
* For the `gen_gan_loss` a value below `0.69` means the generator i doing better than random at foolding the descriminator.
* As training progresses the `gen_l1_loss` should go down.
## Restore the latest checkpoint and test
```py
ls {checkpoint_dir}
```
```py
checkpoint ckpt-5.data-00000-of-00001
ckpt-1.data-00000-of-00001 ckpt-5.index
ckpt-1.index ckpt-6.data-00000-of-00001
ckpt-2.data-00000-of-00001 ckpt-6.index
ckpt-2.index ckpt-7.data-00000-of-00001
ckpt-3.data-00000-of-00001 ckpt-7.index
ckpt-3.index ckpt-8.data-00000-of-00001
ckpt-4.data-00000-of-00001 ckpt-8.index
ckpt-4.index
```
```py
# restoring the latest checkpoint in checkpoint_dir
checkpoint.restore(tf.train.latest_checkpoint(checkpoint_dir))
```
```py
<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7f4fce701160>
```
## Generate using test dataset
```py
# Run the trained model on a few examples from the test dataset
for inp, tar in test_dataset.take(5):
generate_images(generator, inp, tar)
```
![png](img/21b3b7303748422d35a6212f940d399c.png)
![png](img/711ebb2cc10e3bb88f77a6eb89fac014.png)
![png](img/7138c243e1e2c00466be2191f6395597.png)
![png](img/a83182d7f6b11d76dd2d428db01ade58.png)
![png](img/5f0049e4eda5b1689106731ac4d622f6.png)

View File

@@ -1,499 +0,0 @@
# CycleGAN
> 原文:[https://tensorflow.google.cn/tutorials/generative/cyclegan](https://tensorflow.google.cn/tutorials/generative/cyclegan)
<devsite-mathjax config="TeX-AMS-MML_SVG"></devsite-mathjax>
**Note:** 我们的 TensorFlow 社区翻译了这些文档。因为社区翻译是尽力而为, 所以无法保证它们是最准确的,并且反映了最新的 [官方英文文档](https://tensorflow.google.cn/?hl=en)。如果您有改进此翻译的建议, 请提交 pull request 到 [tensorflow/docs](https://github.com/tensorflow/docs) GitHub 仓库。要志愿地撰写或者审核译文,请加入 [docs-zh-cn@tensorflow.org Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs-zh-cn)。
本笔记演示了使用条件 GAN 进行的未配对图像到图像转换,如[使用循环一致的对抗网络进行未配对图像到图像转换](https://arxiv.org/abs/1703.10593) 中所述,也称之为 CycleGAN。论文提出了一种可以捕捉图像域特征并找出如何将这些特征转换为另一个图像域的方法而无需任何成对的训练样本。
本笔记假定您熟悉 Pix2Pix您可以在 [Pix2Pix 教程](https://tensorflow.google.cn/tutorials/generative/pix2pix)中了解有关它的信息。CycleGAN 的代码与其相似,主要区别在于额外的损失函数,以及非配对训练数据的使用。
CycleGAN 使用循环一致损失来使训练过程无需配对数据。换句话说,它可以从一个域转换到另一个域,而不需要在源域与目标域之间进行一对一映射。
这为完成许多有趣的任务开辟了可能性,例如照片增强、图片着色、风格迁移等。您所需要的只是源数据集和目标数据集(仅仅是图片目录)
![输出图像 1](img/921588a88d035dfd280c98f420033345.png) ![输出图像 2](img/f89cb56c5d3c77f56118a42ca7fb3936.png)
## 设定输入管线
安装 [tensorflow_examples](https://github.com/tensorflow/examples) 包,以导入生成器和判别器。
```py
pip install -q git+https://github.com/tensorflow/examples.git
```
```py
WARNING: You are using pip version 20.2.2; however, version 20.2.3 is available.
You should consider upgrading via the '/tmpfs/src/tf_docs_env/bin/python -m pip install --upgrade pip' command.
```
```py
import tensorflow as tf
```
```py
import tensorflow_datasets as tfds
from tensorflow_examples.models.pix2pix import pix2pix
import os
import time
import matplotlib.pyplot as plt
from IPython.display import clear_output
tfds.disable_progress_bar()
AUTOTUNE = tf.data.experimental.AUTOTUNE
```
## 输入管线
本教程训练一个模型,以将普通马图片转换为斑马图片。您可以在[此处](https://tensorflow.google.cn/datasets/datasets#cycle_gan)获取该数据集以及类似数据集。
如[论文](https://arxiv.org/abs/1703.10593)所述,将随机抖动和镜像应用到训练集。这是一些避免过拟合的图像增强技术。
这类似于 [pix2pix](https://tensorflow.google.cn/tutorials/generative/pix2pix#load_the_dataset) 中所做的工作。
* 在随机抖动中,图片大小调整为 `286 x 286`,随后被随机裁剪为 `256 x 256`
* 在随机镜像中,图片会从左到右随机翻转。
```py
dataset, metadata = tfds.load('cycle_gan/horse2zebra',
with_info=True, as_supervised=True)
train_horses, train_zebras = dataset['trainA'], dataset['trainB']
test_horses, test_zebras = dataset['testA'], dataset['testB']
```
```py
Downloading and preparing dataset cycle_gan/horse2zebra/2.0.0 (download: 111.45 MiB, generated: Unknown size, total: 111.45 MiB) to /home/kbuilder/tensorflow_datasets/cycle_gan/horse2zebra/2.0.0...
Shuffling and writing examples to /home/kbuilder/tensorflow_datasets/cycle_gan/horse2zebra/2.0.0.incompleteNSW88L/cycle_gan-trainA.tfrecord
Shuffling and writing examples to /home/kbuilder/tensorflow_datasets/cycle_gan/horse2zebra/2.0.0.incompleteNSW88L/cycle_gan-trainB.tfrecord
Shuffling and writing examples to /home/kbuilder/tensorflow_datasets/cycle_gan/horse2zebra/2.0.0.incompleteNSW88L/cycle_gan-testA.tfrecord
Shuffling and writing examples to /home/kbuilder/tensorflow_datasets/cycle_gan/horse2zebra/2.0.0.incompleteNSW88L/cycle_gan-testB.tfrecord
Dataset cycle_gan downloaded and prepared to /home/kbuilder/tensorflow_datasets/cycle_gan/horse2zebra/2.0.0\. Subsequent calls will reuse this data.
```
```py
BUFFER_SIZE = 1000
BATCH_SIZE = 1
IMG_WIDTH = 256
IMG_HEIGHT = 256
```
```py
def random_crop(image):
cropped_image = tf.image.random_crop(
image, size=[IMG_HEIGHT, IMG_WIDTH, 3])
return cropped_image
```
```py
# 将图像归一化到区间 [-1, 1] 内。
def normalize(image):
image = tf.cast(image, tf.float32)
image = (image / 127.5) - 1
return image
```
```py
def random_jitter(image):
# 调整大小为 286 x 286 x 3
image = tf.image.resize(image, [286, 286],
method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
# 随机裁剪到 256 x 256 x 3
image = random_crop(image)
# 随机镜像
image = tf.image.random_flip_left_right(image)
return image
```
```py
def preprocess_image_train(image, label):
image = random_jitter(image)
image = normalize(image)
return image
```
```py
def preprocess_image_test(image, label):
image = normalize(image)
return image
```
```py
train_horses = train_horses.map(
preprocess_image_train, num_parallel_calls=AUTOTUNE).cache().shuffle(
BUFFER_SIZE).batch(1)
train_zebras = train_zebras.map(
preprocess_image_train, num_parallel_calls=AUTOTUNE).cache().shuffle(
BUFFER_SIZE).batch(1)
test_horses = test_horses.map(
preprocess_image_test, num_parallel_calls=AUTOTUNE).cache().shuffle(
BUFFER_SIZE).batch(1)
test_zebras = test_zebras.map(
preprocess_image_test, num_parallel_calls=AUTOTUNE).cache().shuffle(
BUFFER_SIZE).batch(1)
```
```py
sample_horse = next(iter(train_horses))
sample_zebra = next(iter(train_zebras))
```
```py
plt.subplot(121)
plt.title('Horse')
plt.imshow(sample_horse[0] * 0.5 + 0.5)
plt.subplot(122)
plt.title('Horse with random jitter')
plt.imshow(random_jitter(sample_horse[0]) * 0.5 + 0.5)
```
```py
<matplotlib.image.AxesImage at 0x7f5a600b8048>
```
![png](img/6b843e3001e6a49928fc35d8af4c843d.png)
```py
plt.subplot(121)
plt.title('Zebra')
plt.imshow(sample_zebra[0] * 0.5 + 0.5)
plt.subplot(122)
plt.title('Zebra with random jitter')
plt.imshow(random_jitter(sample_zebra[0]) * 0.5 + 0.5)
```
```py
<matplotlib.image.AxesImage at 0x7f5a101663c8>
```
![png](img/34e85547487e77a52b9e494a05fdc8f8.png)
## 导入并重用 Pix2Pix 模型
通过安装的 [tensorflow_examples](https://github.com/tensorflow/examples) 包导入 [Pix2Pix](https://github.com/tensorflow/examples/blob/master/tensorflow_examples/models/pix2pix/pix2pix.py) 中的生成器和判别器。
本教程中使用模型体系结构与 [pix2pix](https://github.com/tensorflow/examples/blob/master/tensorflow_examples/models/pix2pix/pix2pix.py) 中所使用的非常相似。一些区别在于:
* Cyclegan 使用 [instance normalization实例归一化](https://arxiv.org/abs/1607.08022)而不是 [batch normalization (批归一化)](https://arxiv.org/abs/1502.03167)。
* [CycleGAN 论文](https://arxiv.org/abs/1703.10593)使用一种基于 `resnet` 的改进生成器。简单起见,本教程使用的是改进的 `unet` 生成器。
这里训练了两个生成器G 和 F以及两个判别器X 和 Y
* 生成器 `G` 学习将图片 `X` 转换为 `Y`。 $(G: X -> Y)$
* 生成器 `F` 学习将图片 `Y` 转换为 `X`。 $(F: Y -> X)$
* 判别器 `D_X` 学习区分图片 `X` 与生成的图片 `X` (`F(Y)`)。
* 判别器 `D_Y` 学习区分图片 `Y` 与生成的图片 `Y` (`G(X)`)。
![Cyclegan 模型](img/141e262e42c195dfe1174f7824ff4c3c.png)
```py
OUTPUT_CHANNELS = 3
generator_g = pix2pix.unet_generator(OUTPUT_CHANNELS, norm_type='instancenorm')
generator_f = pix2pix.unet_generator(OUTPUT_CHANNELS, norm_type='instancenorm')
discriminator_x = pix2pix.discriminator(norm_type='instancenorm', target=False)
discriminator_y = pix2pix.discriminator(norm_type='instancenorm', target=False)
```
```py
to_zebra = generator_g(sample_horse)
to_horse = generator_f(sample_zebra)
plt.figure(figsize=(8, 8))
contrast = 8
imgs = [sample_horse, to_zebra, sample_zebra, to_horse]
title = ['Horse', 'To Zebra', 'Zebra', 'To Horse']
for i in range(len(imgs)):
plt.subplot(2, 2, i+1)
plt.title(title[i])
if i % 2 == 0:
plt.imshow(imgs[i][0] * 0.5 + 0.5)
else:
plt.imshow(imgs[i][0] * 0.5 * contrast + 0.5)
plt.show()
```
```py
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
```
![png](img/e2143b6a00159c480e078bcbc7c8c72b.png)
```py
plt.figure(figsize=(8, 8))
plt.subplot(121)
plt.title('Is a real zebra?')
plt.imshow(discriminator_y(sample_zebra)[0, ..., -1], cmap='RdBu_r')
plt.subplot(122)
plt.title('Is a real horse?')
plt.imshow(discriminator_x(sample_horse)[0, ..., -1], cmap='RdBu_r')
plt.show()
```
![png](img/6637dace2ef4faea4a327361aec7c4ae.png)
## 损失函数
在 CycleGAN 中,没有可训练的成对数据,因此无法保证输入 `x` 和 目标 `y` 数据对在训练期间是有意义的。所以为了强制网络学习正确的映射,作者提出了循环一致损失。
判别器损失和生成器损失和 [pix2pix](https://google.tensorflow.cn/tutorials/generative/pix2pix#define_the_loss_functions_and_the_optimizer) 中所使用的类似。
```py
LAMBDA = 10
```
```py
loss_obj = tf.keras.losses.BinaryCrossentropy(from_logits=True)
```
```py
def discriminator_loss(real, generated):
real_loss = loss_obj(tf.ones_like(real), real)
generated_loss = loss_obj(tf.zeros_like(generated), generated)
total_disc_loss = real_loss + generated_loss
return total_disc_loss * 0.5
```
```py
def generator_loss(generated):
return loss_obj(tf.ones_like(generated), generated)
```
循环一致意味着结果应接近原始输出。例如,将一句英文译为法文,随后再从法文翻译回英文,最终的结果句应与原始句输入相同。
在循环一致损失中,
* 图片 $X$ 通过生成器 $G$ 传递,该生成器生成图片 $\hat{Y}$。
* 生成的图片 $\hat{Y}$ 通过生成器 $F$ 传递,循环生成图片 $\hat{X}$。
* 在 $X$ 和 $\hat{X}$ 之间计算平均绝对误差。
$$forward\ cycle\ consistency\ loss: X -> G(X) -> F(G(X)) \sim \hat{X}$$$$backward\ cycle\ consistency\ loss: Y -> F(Y) -> G(F(Y)) \sim \hat{Y}$$
![循环损失](img/4aa12ddc0a8f44acc45b9ed9dc9055bf.png)
```py
def calc_cycle_loss(real_image, cycled_image):
loss1 = tf.reduce_mean(tf.abs(real_image - cycled_image))
return LAMBDA * loss1
```
如上所示,生成器 $G$ 负责将图片 $X$ 转换为 $Y$。一致性损失表明,如果您将图片 $Y$ 馈送给生成器 $G$,它应当生成真实图片 $Y$ 或接近于 $Y$ 的图片。
$$Identity\ loss = |G(Y) - Y| + |F(X) - X|$$
```py
def identity_loss(real_image, same_image):
loss = tf.reduce_mean(tf.abs(real_image - same_image))
return LAMBDA * 0.5 * loss
```
为所有生成器和判别器初始化优化器。
```py
generator_g_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
generator_f_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
discriminator_x_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
discriminator_y_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
```
## Checkpoints
```py
checkpoint_path = "./checkpoints/train"
ckpt = tf.train.Checkpoint(generator_g=generator_g,
generator_f=generator_f,
discriminator_x=discriminator_x,
discriminator_y=discriminator_y,
generator_g_optimizer=generator_g_optimizer,
generator_f_optimizer=generator_f_optimizer,
discriminator_x_optimizer=discriminator_x_optimizer,
discriminator_y_optimizer=discriminator_y_optimizer)
ckpt_manager = tf.train.CheckpointManager(ckpt, checkpoint_path, max_to_keep=5)
# 如果存在检查点,恢复最新版本检查点
if ckpt_manager.latest_checkpoint:
ckpt.restore(ckpt_manager.latest_checkpoint)
print ('Latest checkpoint restored!!')
```
## 训练
注意:本示例模型比论文中训练了更少的 epoch本示例为 40 epoch论文中为 200 epoch以使训练时间相对于本教程是合理的。预测的准确率可能会低一些。
```py
EPOCHS = 40
```
```py
def generate_images(model, test_input):
prediction = model(test_input)
plt.figure(figsize=(12, 12))
display_list = [test_input[0], prediction[0]]
title = ['Input Image', 'Predicted Image']
for i in range(2):
plt.subplot(1, 2, i+1)
plt.title(title[i])
# 获取范围在 [0, 1] 之间的像素值以绘制它。
plt.imshow(display_list[i] * 0.5 + 0.5)
plt.axis('off')
plt.show()
```
尽管训练循环看起来很复杂,其实包含四个基本步骤:
* 获取预测。
* 计算损失值。
* 使用反向传播计算损失值。
* 将梯度应用于优化器。
```py
@tf.function
def train_step(real_x, real_y):
# persistent 设置为 Ture因为 GradientTape 被多次应用于计算梯度。
with tf.GradientTape(persistent=True) as tape:
# 生成器 G 转换 X -> Y。
# 生成器 F 转换 Y -> X。
fake_y = generator_g(real_x, training=True)
cycled_x = generator_f(fake_y, training=True)
fake_x = generator_f(real_y, training=True)
cycled_y = generator_g(fake_x, training=True)
# same_x 和 same_y 用于一致性损失。
same_x = generator_f(real_x, training=True)
same_y = generator_g(real_y, training=True)
disc_real_x = discriminator_x(real_x, training=True)
disc_real_y = discriminator_y(real_y, training=True)
disc_fake_x = discriminator_x(fake_x, training=True)
disc_fake_y = discriminator_y(fake_y, training=True)
# 计算损失。
gen_g_loss = generator_loss(disc_fake_y)
gen_f_loss = generator_loss(disc_fake_x)
total_cycle_loss = calc_cycle_loss(real_x, cycled_x) + calc_cycle_loss(real_y, cycled_y)
# 总生成器损失 = 对抗性损失 + 循环损失。
total_gen_g_loss = gen_g_loss + total_cycle_loss + identity_loss(real_y, same_y)
total_gen_f_loss = gen_f_loss + total_cycle_loss + identity_loss(real_x, same_x)
disc_x_loss = discriminator_loss(disc_real_x, disc_fake_x)
disc_y_loss = discriminator_loss(disc_real_y, disc_fake_y)
# 计算生成器和判别器损失。
generator_g_gradients = tape.gradient(total_gen_g_loss,
generator_g.trainable_variables)
generator_f_gradients = tape.gradient(total_gen_f_loss,
generator_f.trainable_variables)
discriminator_x_gradients = tape.gradient(disc_x_loss,
discriminator_x.trainable_variables)
discriminator_y_gradients = tape.gradient(disc_y_loss,
discriminator_y.trainable_variables)
# 将梯度应用于优化器。
generator_g_optimizer.apply_gradients(zip(generator_g_gradients,
generator_g.trainable_variables))
generator_f_optimizer.apply_gradients(zip(generator_f_gradients,
generator_f.trainable_variables))
discriminator_x_optimizer.apply_gradients(zip(discriminator_x_gradients,
discriminator_x.trainable_variables))
discriminator_y_optimizer.apply_gradients(zip(discriminator_y_gradients,
discriminator_y.trainable_variables))
```
```py
for epoch in range(EPOCHS):
start = time.time()
n = 0
for image_x, image_y in tf.data.Dataset.zip((train_horses, train_zebras)):
train_step(image_x, image_y)
if n % 10 == 0:
print ('.', end='')
n+=1
clear_output(wait=True)
# 使用一致的图像sample_horse以便模型的进度清晰可见。
generate_images(generator_g, sample_horse)
if (epoch + 1) % 5 == 0:
ckpt_save_path = ckpt_manager.save()
print ('Saving checkpoint for epoch {} at {}'.format(epoch+1,
ckpt_save_path))
print ('Time taken for epoch {} is {} sec\n'.format(epoch + 1,
time.time()-start))
```
![png](img/c2a117375845a6a7d1c87b2c84de54e8.png)
```py
Saving checkpoint for epoch 40 at ./checkpoints/train/ckpt-8
Time taken for epoch 40 is 175.41231870651245 sec
```
## 使用测试数据集进行生成
```py
# 在测试数据集上运行训练的模型。
for inp in test_horses.take(5):
generate_images(generator_g, inp)
```
![png](img/d68f92600680dfc45d965045e843ec4d.png)
![png](img/0ba1e7316ba7e228576bbcd85280c309.png)
![png](img/33043d022bdb4912f00756593d5b4a7c.png)
![png](img/032dc17ad0509afd4505858b1f0c7d19.png)
![png](img/d653a0d6330958d36f31b35e1410ff6d.png)
## 下一步
本教程展示了如何从 [Pix2Pix](https://tensorflow.google.cn/tutorials/generative/pix2pix) 教程实现的生成器和判别器开始实现 CycleGAN。 下一步,您可以尝试使用一个来源于 [TensorFlow 数据集](https://tensorflow.google.cn/datasets/datasets#cycle_gan)的不同的数据集。
您也可以训练更多的 epoch 以改进结果,或者可以实现[论文](https://arxiv.org/abs/1703.10593)中所使用的改进 ResNet 生成器来代替这里使用的 U-Net 生成器。

View File

@@ -1,186 +0,0 @@
# Adversarial example using FGSM
> 原文:[https://tensorflow.google.cn/tutorials/generative/adversarial_fgsm](https://tensorflow.google.cn/tutorials/generative/adversarial_fgsm)
<devsite-mathjax config="TeX-AMS-MML_SVG"></devsite-mathjax>
This tutorial creates an *adversarial example* using the Fast Gradient Signed Method (FGSM) attack as described in [Explaining and Harnessing Adversarial Examples](https://arxiv.org/abs/1412.6572) by Goodfellow *et al*. This was one of the first and most popular attacks to fool a neural network.
## What is an adversarial example?
Adversarial examples are specialised inputs created with the purpose of confusing a neural network, resulting in the misclassification of a given input. These notorious inputs are indistinguishable to the human eye, but cause the network to fail to identify the contents of the image. There are several types of such attacks, however, here the focus is on the fast gradient sign method attack, which is a *white box* attack whose goal is to ensure misclassification. A white box attack is where the attacker has complete access to the model being attacked. One of the most famous examples of an adversarial image shown below is taken from the aforementioned paper.
![Adversarial Example](img/ac69959225a206f2b2c5ed2e33218511.png)
Here, starting with the image of a panda, the attacker adds small perturbations (distortions) to the original image, which results in the model labelling this image as a gibbon, with high confidence. The process of adding these perturbations is explained below.
## Fast gradient sign method
The fast gradient sign method works by using the gradients of the neural network to create an adversarial example. For an input image, the method uses the gradients of the loss with respect to the input image to create a new image that maximises the loss. This new image is called the adversarial image. This can be summarised using the following expression:
$$adv\_x = x + \epsilon*\text{sign}(\nabla_xJ(\theta, x, y))$$
where
* adv_x : Adversarial image.
* x : Original input image.
* y : Original input label.
* $\epsilon$ : Multiplier to ensure the perturbations are small.
* $\theta$ : Model parameters.
* $J$ : Loss.
An intriguing property here, is the fact that the gradients are taken with respect to the input image. This is done because the objective is to create an image that maximises the loss. A method to accomplish this is to find how much each pixel in the image contributes to the loss value, and add a perturbation accordingly. This works pretty fast because it is easy to find how each input pixel contributes to the loss by using the chain rule and finding the required gradients. Hence, the gradients are taken with respect to the image. In addition, since the model is no longer being trained (thus the gradient is not taken with respect to the trainable variables, i.e., the model parameters), and so the model parameters remain constant. The only goal is to fool an already trained model.
So let's try and fool a pretrained model. In this tutorial, the model is [MobileNetV2](https://tensorflow.google.cn/versions/r2.0/api_docs/python/tf/keras/applications/MobileNetV2) model, pretrained on [ImageNet](http://www.image-net.org/).
```py
import tensorflow as tf
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rcParams['figure.figsize'] = (8, 8)
mpl.rcParams['axes.grid'] = False
```
Let's load the pretrained MobileNetV2 model and the ImageNet class names.
```py
pretrained_model = tf.keras.applications.MobileNetV2(include_top=True,
weights='imagenet')
pretrained_model.trainable = False
# ImageNet labels
decode_predictions = tf.keras.applications.mobilenet_v2.decode_predictions
```
```py
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224.h5
14540800/14536120 [==============================] - 0s 0us/step
```
```py
# Helper function to preprocess the image so that it can be inputted in MobileNetV2
def preprocess(image):
image = tf.cast(image, tf.float32)
image = tf.image.resize(image, (224, 224))
image = tf.keras.applications.mobilenet_v2.preprocess_input(image)
image = image[None, ...]
return image
# Helper function to extract labels from probability vector
def get_imagenet_label(probs):
return decode_predictions(probs, top=1)[0][0]
```
## Original image
Let's use a sample image of a [Labrador Retriever](https://commons.wikimedia.org/wiki/File:YellowLabradorLooking_new.jpg) by Mirko [CC-BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/) from Wikimedia Common and create adversarial examples from it. The first step is to preprocess it so that it can be fed as an input to the MobileNetV2 model.
```py
image_path = tf.keras.utils.get_file('YellowLabradorLooking_new.jpg', 'https://storage.googleapis.com/download.tensorflow.org/example_images/YellowLabradorLooking_new.jpg')
image_raw = tf.io.read_file(image_path)
image = tf.image.decode_image(image_raw)
image = preprocess(image)
image_probs = pretrained_model.predict(image)
```
```py
Downloading data from https://storage.googleapis.com/download.tensorflow.org/example_images/YellowLabradorLooking_new.jpg
90112/83281 [================================] - 0s 0us/step
```
Let's have a look at the image.
```py
plt.figure()
plt.imshow(image[0]*0.5+0.5) # To change [-1, 1] to [0,1]
_, image_class, class_confidence = get_imagenet_label(image_probs)
plt.title('{} : {:.2f}% Confidence'.format(image_class, class_confidence*100))
plt.show()
```
```py
Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/imagenet_class_index.json
40960/35363 [==================================] - 0s 0us/step
```
![png](img/1c498df577bb9dd0638c25332e7b68a1.png)
## Create the adversarial image
### Implementing fast gradient sign method
The first step is to create perturbations which will be used to distort the original image resulting in an adversarial image. As mentioned, for this task, the gradients are taken with respect to the image.
```py
loss_object = tf.keras.losses.CategoricalCrossentropy()
def create_adversarial_pattern(input_image, input_label):
with tf.GradientTape() as tape:
tape.watch(input_image)
prediction = pretrained_model(input_image)
loss = loss_object(input_label, prediction)
# Get the gradients of the loss w.r.t to the input image.
gradient = tape.gradient(loss, input_image)
# Get the sign of the gradients to create the perturbation
signed_grad = tf.sign(gradient)
return signed_grad
```
The resulting perturbations can also be visualised.
```py
# Get the input label of the image.
labrador_retriever_index = 208
label = tf.one_hot(labrador_retriever_index, image_probs.shape[-1])
label = tf.reshape(label, (1, image_probs.shape[-1]))
perturbations = create_adversarial_pattern(image, label)
plt.imshow(perturbations[0]*0.5+0.5); # To change [-1, 1] to [0,1]
```
![png](img/e3ffe6a29488821b01dd98cba6690e5f.png)
Let's try this out for different values of epsilon and observe the resultant image. You'll notice that as the value of epsilon is increased, it becomes easier to fool the network. However, this comes as a trade-off which results in the perturbations becoming more identifiable.
```py
def display_images(image, description):
_, label, confidence = get_imagenet_label(pretrained_model.predict(image))
plt.figure()
plt.imshow(image[0]*0.5+0.5)
plt.title('{} \n {} : {:.2f}% Confidence'.format(description,
label, confidence*100))
plt.show()
```
```py
epsilons = [0, 0.01, 0.1, 0.15]
descriptions = [('Epsilon = {:0.3f}'.format(eps) if eps else 'Input')
for eps in epsilons]
for i, eps in enumerate(epsilons):
adv_x = image + eps*perturbations
adv_x = tf.clip_by_value(adv_x, -1, 1)
display_images(adv_x, descriptions[i])
```
![png](img/8aa1d48ada55b367535dbe964ad2cd79.png)
![png](img/4bebff99ef427fe52c09346e6f6b1971.png)
![png](img/7fb60d07e3fa3bd88b02197b1f12223f.png)
![png](img/66503afc507478f400022c625de3c878.png)
## Next steps
Now that you know about adversarial attacks, try this out on different datasets and different architectures. You may also create and train your own model, and then attempt to fool it using the same method. You can also try and see how the confidence in predictions vary as you change epsilon.
Though powerful, the attack shown in this tutorial was just the start of research into adversarial attacks, and there have been multiple papers creating more powerful attacks since then. In addition to adversarial attacks, research has also led to the creation of defenses, which aims at creating robust machine learning models. You may review this [survey paper](https://arxiv.org/abs/1810.00069) for a comprehensive list of adversarial attacks and defences.
For many more implementations of adversarial attacks and defenses, you may want to see the adversarial example library [CleverHans](https://github.com/tensorflow/cleverhans).

View File

@@ -1,632 +0,0 @@
# Intro to Autoencoders
> 原文:[https://tensorflow.google.cn/tutorials/generative/autoencoder](https://tensorflow.google.cn/tutorials/generative/autoencoder)
This tutorial introduces autoencoders with three examples: the basics, image denoising, and anomaly detection.
An autoencoder is a special type of neural network that is trained to copy its input to its output. For example, given an image of a handwritten digit, an autoencoder first encodes the image into a lower dimensional latent representation, then decodes the latent representation back to an image. An autoencoder learns to compress the data while minimizing the reconstruction error.
To learn more about autoencoders, please consider reading chapter 14 from [Deep Learning](https://www.deeplearningbook.org/) by Ian Goodfellow, Yoshua Bengio, and Aaron Courville.
## Import TensorFlow and other libraries
```py
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.metrics import accuracy_score, precision_score, recall_score
from sklearn.model_selection import train_test_split
from tensorflow.keras import layers, losses
from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.models import Model
```
## Load the dataset
To start, you will train the basic autoencoder using the Fashon MNIST dataset. Each image in this dataset is 28x28 pixels.
```py
(x_train, _), (x_test, _) = fashion_mnist.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
print (x_train.shape)
print (x_test.shape)
```
```py
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
32768/29515 [=================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
26427392/26421880 [==============================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
8192/5148 [===============================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
4423680/4422102 [==============================] - 0s 0us/step
(60000, 28, 28)
(10000, 28, 28)
```
## First example: Basic autoencoder
![Basic autoencoder results](img/ee409d59cd41f3de0f02655abfc4d0c0.png)
Define an autoencoder with two Dense layers: an `encoder`, which compresses the images into a 64 dimensional latent vector, and a `decoder`, that reconstructs the original image from the latent space.
To define your model, use the [Keras Model Subclassing API](https://tensorflow.google.cn/guide/keras/custom_layers_and_models).
```py
latent_dim = 64
class Autoencoder(Model):
def __init__(self, latent_dim):
super(Autoencoder, self).__init__()
self.latent_dim = latent_dim
self.encoder = tf.keras.Sequential([
layers.Flatten(),
layers.Dense(latent_dim, activation='relu'),
])
self.decoder = tf.keras.Sequential([
layers.Dense(784, activation='sigmoid'),
layers.Reshape((28, 28))
])
def call(self, x):
encoded = self.encoder(x)
decoded = self.decoder(encoded)
return decoded
autoencoder = Autoencoder(latent_dim)
```
```py
autoencoder.compile(optimizer='adam', loss=losses.MeanSquaredError())
```
Train the model using `x_train` as both the input and the target. The `encoder` will learn to compress the dataset from 784 dimensions to the latent space, and the `decoder` will learn to reconstruct the original images. .
```py
autoencoder.fit(x_train, x_train,
epochs=10,
shuffle=True,
validation_data=(x_test, x_test))
```
```py
Epoch 1/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0239 - val_loss: 0.0132
Epoch 2/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0116 - val_loss: 0.0105
Epoch 3/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0100 - val_loss: 0.0097
Epoch 4/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0095 - val_loss: 0.0094
Epoch 5/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0092 - val_loss: 0.0092
Epoch 6/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0090 - val_loss: 0.0091
Epoch 7/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0090 - val_loss: 0.0090
Epoch 8/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0089 - val_loss: 0.0090
Epoch 9/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0088 - val_loss: 0.0090
Epoch 10/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0088 - val_loss: 0.0088
<tensorflow.python.keras.callbacks.History at 0x7f220fe53fd0>
```
Now that the model is trained, let's test it by encoding and decoding images from the test set.
```py
encoded_imgs = autoencoder.encoder(x_test).numpy()
decoded_imgs = autoencoder.decoder(encoded_imgs).numpy()
```
```py
n = 10
plt.figure(figsize=(20, 4))
for i in range(n):
# display original
ax = plt.subplot(2, n, i + 1)
plt.imshow(x_test[i])
plt.title("original")
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# display reconstruction
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(decoded_imgs[i])
plt.title("reconstructed")
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
```
![png](img/c239b9ad6cf9b5f72e3d6d37fd17b9d1.png)
## Second example: Image denoising
![Image denoising results](img/9461d6f88eb7d390eea25f1f034101b5.png)
An autoencoder can also be trained to remove noise from images. In the following section, you will create a noisy version of the Fashion MNIST dataset by applying random noise to each image. You will then train an autoencoder using the noisy image as input, and the original image as the target.
Let's reimport the dataset to omit the modifications made earlier.
```py
(x_train, _), (x_test, _) = fashion_mnist.load_data()
```
```py
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train[..., tf.newaxis]
x_test = x_test[..., tf.newaxis]
print(x_train.shape)
```
```py
(60000, 28, 28, 1)
```
Adding random noise to the images
```py
noise_factor = 0.2
x_train_noisy = x_train + noise_factor * tf.random.normal(shape=x_train.shape)
x_test_noisy = x_test + noise_factor * tf.random.normal(shape=x_test.shape)
x_train_noisy = tf.clip_by_value(x_train_noisy, clip_value_min=0., clip_value_max=1.)
x_test_noisy = tf.clip_by_value(x_test_noisy, clip_value_min=0., clip_value_max=1.)
```
Plot the noisy images.
```py
n = 10
plt.figure(figsize=(20, 2))
for i in range(n):
ax = plt.subplot(1, n, i + 1)
plt.title("original + noise")
plt.imshow(tf.squeeze(x_test_noisy[i]))
plt.gray()
plt.show()
```
![png](img/6c3e8444c64a773d92f67fd4f07992b7.png)
### Define a convolutional autoencoder
In this example, you will train a convolutional autoencoder using [Conv2D](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/Conv2D) layers in the `encoder`, and [Conv2DTranspose](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/Conv2DTranspose) layers in the `decoder`.
```py
class Denoise(Model):
def __init__(self):
super(Denoise, self).__init__()
self.encoder = tf.keras.Sequential([
layers.Input(shape=(28, 28, 1)),
layers.Conv2D(16, (3,3), activation='relu', padding='same', strides=2),
layers.Conv2D(8, (3,3), activation='relu', padding='same', strides=2)])
self.decoder = tf.keras.Sequential([
layers.Conv2DTranspose(8, kernel_size=3, strides=2, activation='relu', padding='same'),
layers.Conv2DTranspose(16, kernel_size=3, strides=2, activation='relu', padding='same'),
layers.Conv2D(1, kernel_size=(3,3), activation='sigmoid', padding='same')])
def call(self, x):
encoded = self.encoder(x)
decoded = self.decoder(encoded)
return decoded
autoencoder = Denoise()
```
```py
autoencoder.compile(optimizer='adam', loss=losses.MeanSquaredError())
```
```py
autoencoder.fit(x_train_noisy, x_train,
epochs=10,
shuffle=True,
validation_data=(x_test_noisy, x_test))
```
```py
Epoch 1/10
1875/1875 [==============================] - 26s 14ms/step - loss: 0.0182 - val_loss: 0.0112
Epoch 2/10
1875/1875 [==============================] - 25s 14ms/step - loss: 0.0100 - val_loss: 0.0093
Epoch 3/10
1875/1875 [==============================] - 25s 13ms/step - loss: 0.0089 - val_loss: 0.0087
Epoch 4/10
1875/1875 [==============================] - 26s 14ms/step - loss: 0.0085 - val_loss: 0.0084
Epoch 5/10
1875/1875 [==============================] - 25s 14ms/step - loss: 0.0083 - val_loss: 0.0083
Epoch 6/10
1875/1875 [==============================] - 26s 14ms/step - loss: 0.0082 - val_loss: 0.0082
Epoch 7/10
1875/1875 [==============================] - 25s 13ms/step - loss: 0.0081 - val_loss: 0.0081
Epoch 8/10
1875/1875 [==============================] - 25s 13ms/step - loss: 0.0081 - val_loss: 0.0080
Epoch 9/10
1875/1875 [==============================] - 25s 13ms/step - loss: 0.0080 - val_loss: 0.0080
Epoch 10/10
1875/1875 [==============================] - 25s 14ms/step - loss: 0.0079 - val_loss: 0.0080
<tensorflow.python.keras.callbacks.History at 0x7f22122b45c0>
```
Let's take a look at a summary of the encoder. Notice how the images are downsampled from 28x28 to 7x7.
```py
autoencoder.encoder.summary()
```
```py
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 14, 14, 16) 160
_________________________________________________________________
conv2d_1 (Conv2D) (None, 7, 7, 8) 1160
=================================================================
Total params: 1,320
Trainable params: 1,320
Non-trainable params: 0
_________________________________________________________________
```
The decoder upsamples the images back from 7x7 to 28x28.
```py
autoencoder.decoder.summary()
```
```py
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_transpose (Conv2DTran (None, 14, 14, 8) 584
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 28, 28, 16) 1168
_________________________________________________________________
conv2d_2 (Conv2D) (None, 28, 28, 1) 145
=================================================================
Total params: 1,897
Trainable params: 1,897
Non-trainable params: 0
_________________________________________________________________
```
Plotting both the noisy images and the denoised images produced by the autoencoder.
```py
encoded_imgs = autoencoder.encoder(x_test).numpy()
decoded_imgs = autoencoder.decoder(encoded_imgs).numpy()
```
```py
n = 10
plt.figure(figsize=(20, 4))
for i in range(n):
# display original + noise
ax = plt.subplot(2, n, i + 1)
plt.title("original + noise")
plt.imshow(tf.squeeze(x_test_noisy[i]))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# display reconstruction
bx = plt.subplot(2, n, i + n + 1)
plt.title("reconstructed")
plt.imshow(tf.squeeze(decoded_imgs[i]))
plt.gray()
bx.get_xaxis().set_visible(False)
bx.get_yaxis().set_visible(False)
plt.show()
```
![png](img/d6d4178e447bc9f8c984345c73202b01.png)
## Third example: Anomaly detection
## Overview
In this example, you will train an autoencoder to detect anomalies on the [ECG5000 dataset](http://www.timeseriesclassification.com/description.php?Dataset=ECG5000). This dataset contains 5,000 [Electrocardiograms](https://en.wikipedia.org/wiki/Electrocardiography), each with 140 data points. You will use a simplified version of the dataset, where each example has been labeled either `0` (corresponding to an abnormal rhythm), or `1` (corresponding to a normal rhythm). You are interested in identifying the abnormal rhythms.
**Note:** This is a labeled dataset, so you could phrase this as a supervised learning problem. The goal of this example is to illustrate anomaly detection concepts you can apply to larger datasets, where you do not have labels available (for example, if you had many thousands of normal rhythms, and only a small number of abnormal rhythms).
How will you detect anomalies using an autoencoder? Recall that an autoencoder is trained to minimize reconstruction error. You will train an autoencoder on the normal rhythms only, then use it to reconstruct all the data. Our hypothesis is that the abnormal rhythms will have higher reconstruction error. You will then classify a rhythm as an anomaly if the reconstruction error surpasses a fixed threshold.
### Load ECG data
The dataset you will use is based on one from [timeseriesclassification.com](http://www.timeseriesclassification.com/description.php?Dataset=ECG5000).
```py
# Download the dataset
dataframe = pd.read_csv('http://storage.googleapis.com/download.tensorflow.org/data/ecg.csv', header=None)
raw_data = dataframe.values
dataframe.head()
```
<devsite-iframe><iframe src="/tutorials/generative/autoencoder_d1e264d3aef03a2f0ce0c60938dad7e5c8bc047c81aeacdbf265389b3baf6cfe.frame" class="framebox inherit-locale " allowfullscreen="" is-upgraded=""></iframe></devsite-iframe>
```py
# The last element contains the labels
labels = raw_data[:, -1]
# The other data points are the electrocadriogram data
data = raw_data[:, 0:-1]
train_data, test_data, train_labels, test_labels = train_test_split(
data, labels, test_size=0.2, random_state=21
)
```
Normalize the data to `[0,1]`.
```py
min_val = tf.reduce_min(train_data)
max_val = tf.reduce_max(train_data)
train_data = (train_data - min_val) / (max_val - min_val)
test_data = (test_data - min_val) / (max_val - min_val)
train_data = tf.cast(train_data, tf.float32)
test_data = tf.cast(test_data, tf.float32)
```
You will train the autoencoder using only the normal rhythms, which are labeled in this dataset as `1`. Separate the normal rhythms from the abnormal rhythms.
```py
train_labels = train_labels.astype(bool)
test_labels = test_labels.astype(bool)
normal_train_data = train_data[train_labels]
normal_test_data = test_data[test_labels]
anomalous_train_data = train_data[~train_labels]
anomalous_test_data = test_data[~test_labels]
```
Plot a normal ECG.
```py
plt.grid()
plt.plot(np.arange(140), normal_train_data[0])
plt.title("A Normal ECG")
plt.show()
```
![png](img/aef2c569f7fec52ed4d6e656dddb8da4.png)
Plot an anomalous ECG.
```py
plt.grid()
plt.plot(np.arange(140), anomalous_train_data[0])
plt.title("An Anomalous ECG")
plt.show()
```
![png](img/7e31e526f055ddde2fd0d3a4e5d60aef.png)
### Build the model
```py
class AnomalyDetector(Model):
def __init__(self):
super(AnomalyDetector, self).__init__()
self.encoder = tf.keras.Sequential([
layers.Dense(32, activation="relu"),
layers.Dense(16, activation="relu"),
layers.Dense(8, activation="relu")])
self.decoder = tf.keras.Sequential([
layers.Dense(16, activation="relu"),
layers.Dense(32, activation="relu"),
layers.Dense(140, activation="sigmoid")])
def call(self, x):
encoded = self.encoder(x)
decoded = self.decoder(encoded)
return decoded
autoencoder = AnomalyDetector()
```
```py
autoencoder.compile(optimizer='adam', loss='mae')
```
Notice that the autoencoder is trained using only the normal ECGs, but is evaluated using the full test set.
```py
history = autoencoder.fit(normal_train_data, normal_train_data,
epochs=20,
batch_size=512,
validation_data=(test_data, test_data),
shuffle=True)
```
```py
Epoch 1/20
5/5 [==============================] - 0s 20ms/step - loss: 0.0582 - val_loss: 0.0534
Epoch 2/20
5/5 [==============================] - 0s 5ms/step - loss: 0.0564 - val_loss: 0.0519
Epoch 3/20
5/5 [==============================] - 0s 5ms/step - loss: 0.0540 - val_loss: 0.0508
Epoch 4/20
5/5 [==============================] - 0s 5ms/step - loss: 0.0514 - val_loss: 0.0491
Epoch 5/20
5/5 [==============================] - 0s 5ms/step - loss: 0.0482 - val_loss: 0.0467
Epoch 6/20
5/5 [==============================] - 0s 5ms/step - loss: 0.0448 - val_loss: 0.0449
Epoch 7/20
5/5 [==============================] - 0s 5ms/step - loss: 0.0414 - val_loss: 0.0429
Epoch 8/20
5/5 [==============================] - 0s 5ms/step - loss: 0.0380 - val_loss: 0.0413
Epoch 9/20
5/5 [==============================] - 0s 5ms/step - loss: 0.0345 - val_loss: 0.0400
Epoch 10/20
5/5 [==============================] - 0s 5ms/step - loss: 0.0316 - val_loss: 0.0390
Epoch 11/20
5/5 [==============================] - 0s 5ms/step - loss: 0.0293 - val_loss: 0.0382
Epoch 12/20
5/5 [==============================] - 0s 5ms/step - loss: 0.0276 - val_loss: 0.0379
Epoch 13/20
5/5 [==============================] - 0s 5ms/step - loss: 0.0262 - val_loss: 0.0370
Epoch 14/20
5/5 [==============================] - 0s 5ms/step - loss: 0.0251 - val_loss: 0.0366
Epoch 15/20
5/5 [==============================] - 0s 5ms/step - loss: 0.0244 - val_loss: 0.0359
Epoch 16/20
5/5 [==============================] - 0s 5ms/step - loss: 0.0237 - val_loss: 0.0355
Epoch 17/20
5/5 [==============================] - 0s 5ms/step - loss: 0.0231 - val_loss: 0.0352
Epoch 18/20
5/5 [==============================] - 0s 5ms/step - loss: 0.0225 - val_loss: 0.0345
Epoch 19/20
5/5 [==============================] - 0s 5ms/step - loss: 0.0219 - val_loss: 0.0343
Epoch 20/20
5/5 [==============================] - 0s 5ms/step - loss: 0.0214 - val_loss: 0.0341
```
```py
plt.plot(history.history["loss"], label="Training Loss")
plt.plot(history.history["val_loss"], label="Validation Loss")
plt.legend()
```
```py
<matplotlib.legend.Legend at 0x7f21d014f438>
```
![png](img/062d680b7bfc538f75dbd6e3d7562502.png)
You will soon classify an ECG as anomalous if the reconstruction error is greater than one standard deviation from the normal training examples. First, let's plot a normal ECG from the training set, the reconstruction after it's encoded and decoded by the autoencoder, and the reconstruction error.
```py
encoded_imgs = autoencoder.encoder(normal_test_data).numpy()
decoded_imgs = autoencoder.decoder(encoded_imgs).numpy()
plt.plot(normal_test_data[0],'b')
plt.plot(decoded_imgs[0],'r')
plt.fill_between(np.arange(140), decoded_imgs[0], normal_test_data[0], color='lightcoral' )
plt.legend(labels=["Input", "Reconstruction", "Error"])
plt.show()
```
![png](img/8f8b815630d4213a923f492eacc9d2d0.png)
Create a similar plot, this time for an anomalous test example.
```py
encoded_imgs = autoencoder.encoder(anomalous_test_data).numpy()
decoded_imgs = autoencoder.decoder(encoded_imgs).numpy()
plt.plot(anomalous_test_data[0],'b')
plt.plot(decoded_imgs[0],'r')
plt.fill_between(np.arange(140), decoded_imgs[0], anomalous_test_data[0], color='lightcoral' )
plt.legend(labels=["Input", "Reconstruction", "Error"])
plt.show()
```
![png](img/65e3cc57565dea4503cb5f3f7dca3035.png)
### Detect anomalies
Detect anomalies by calculating whether the reconstruction loss is greater than a fixed threshold. In this tutorial, you will calculate the mean average error for normal examples from the training set, then classify future examples as anomalous if the reconstruction error is higher than one standard deviation from the training set.
Plot the reconstruction error on normal ECGs from the training set
```py
reconstructions = autoencoder.predict(normal_train_data)
train_loss = tf.keras.losses.mae(reconstructions, normal_train_data)
plt.hist(train_loss, bins=50)
plt.xlabel("Train loss")
plt.ylabel("No of examples")
plt.show()
```
![png](img/17b66fa7e9565fdeabc4fe4752bad60d.png)
Choose a threshold value that is one standard deviations above the mean.
```py
threshold = np.mean(train_loss) + np.std(train_loss)
print("Threshold: ", threshold)
```
```py
Threshold: 0.033377893
```
**Note:** There are other strategies you could use to select a threshold value above which test examples should be classified as anomalous, the correct approach will depend on your dataset. You can learn more with the links at the end of this tutorial.
If you examine the recontruction error for the anomalous examples in the test set, you'll notice most have greater reconstruction error than the threshold. By varing the threshold, you can adjust the [precision](https://developers.google.cn/machine-learning/glossary#precision) and [recall](https://developers.google.cn/machine-learning/glossary#recall) of your classifier.
```py
reconstructions = autoencoder.predict(anomalous_test_data)
test_loss = tf.keras.losses.mae(reconstructions, anomalous_test_data)
plt.hist(test_loss, bins=50)
plt.xlabel("Test loss")
plt.ylabel("No of examples")
plt.show()
```
![png](img/f9843723cb76f7e84a4d3e7435c3a2c0.png)
Classify an ECG as an anomaly if the reconstruction error is greater than the threshold.
```py
def predict(model, data, threshold):
reconstructions = model(data)
loss = tf.keras.losses.mae(reconstructions, data)
return tf.math.less(loss, threshold)
def print_stats(predictions, labels):
print("Accuracy = {}".format(accuracy_score(labels, preds)))
print("Precision = {}".format(precision_score(labels, preds)))
print("Recall = {}".format(recall_score(labels, preds)))
```
```py
preds = predict(autoencoder, test_data, threshold)
print_stats(preds, test_labels)
```
```py
Accuracy = 0.944
Precision = 0.9921875
Recall = 0.9071428571428571
```
## Next steps
To learn more about anomaly detection with autoencoders, check out this excellent [interactive example](https://anomagram.fastforwardlabs.com/#/) built with TensorFlow.js by Victor Dibia. For a real-world use case, you can learn how [Airbus Detects Anomalies in ISS Telemetry Data](https://blog.tensorflow.org/2020/04/how-airbus-detects-anomalies-iss-telemetry-data-tfx.html) using TensorFlow. To learn more about the basics, consider reading this [blog post](https://blog.keras.io/building-autoencoders-in-keras.html) by François Chollet. For more details, check out chapter 14 from [Deep Learning](https://www.deeplearningbook.org/) by Ian Goodfellow, Yoshua Bengio, and Aaron Courville.

View File

@@ -1,329 +0,0 @@
# 卷积变分自编码器
> 原文:[https://tensorflow.google.cn/tutorials/generative/cvae](https://tensorflow.google.cn/tutorials/generative/cvae)
<devsite-mathjax config="TeX-AMS-MML_SVG"></devsite-mathjax>
**Note:** 我们的 TensorFlow 社区翻译了这些文档。因为社区翻译是尽力而为, 所以无法保证它们是最准确的,并且反映了最新的 [官方英文文档](https://tensorflow.google.cn/?hl=en)。如果您有改进此翻译的建议, 请提交 pull request 到 [tensorflow/docs](https://github.com/tensorflow/docs) GitHub 仓库。要志愿地撰写或者审核译文,请加入 [docs-zh-cn@tensorflow.org Google Group](https://groups.google.com/a/tensorflow.org/forum/#!forum/docs-zh-cn)。
![训练过程中输出的演变](img/82444fa7539ed0a798d9a1de5aaf147b.png)
本笔记演示了如何通过训练变分自编码器([1](https://arxiv.org/abs/1312.6114), [2](https://arxiv.org/abs/1401.4082))来生成手写数字图片。
```py
# 用于生成 gif
pip install -q imageio
```
```py
WARNING: You are using pip version 20.2.2; however, version 20.2.3 is available.
You should consider upgrading via the '/tmpfs/src/tf_docs_env/bin/python -m pip install --upgrade pip' command.
```
## 导入 Tensorflow 与其他库
```py
import tensorflow as tf
import os
import time
import numpy as np
import glob
import matplotlib.pyplot as plt
import PIL
import imageio
from IPython import display
```
## 加载 MNIST 数据集
每个 MNIST 图片最初都是包含 784 个整数的向量,每个整数取值都在 0-255 之间,表示像素的强度。我们在模型中使用伯努利分布对每个像素进行建模,并对数据集进行静态二值化。
```py
(train_images, _), (test_images, _) = tf.keras.datasets.mnist.load_data()
```
```py
train_images = train_images.reshape(train_images.shape[0], 28, 28, 1).astype('float32')
test_images = test_images.reshape(test_images.shape[0], 28, 28, 1).astype('float32')
# 标准化图片到区间 [0., 1.] 内
train_images /= 255.
test_images /= 255.
# 二值化
train_images[train_images >= .5] = 1.
train_images[train_images < .5] = 0.
test_images[test_images >= .5] = 1.
test_images[test_images < .5] = 0.
```
```py
TRAIN_BUF = 60000
BATCH_SIZE = 100
TEST_BUF = 10000
```
## 使用 *tf.data* 来将数据分批和打乱
```py
train_dataset = tf.data.Dataset.from_tensor_slices(train_images).shuffle(TRAIN_BUF).batch(BATCH_SIZE)
test_dataset = tf.data.Dataset.from_tensor_slices(test_images).shuffle(TEST_BUF).batch(BATCH_SIZE)
```
## 通过 *tf.keras.Sequential* 连接生成网络与推理网络
在我们的 VAE 示例中,我们将两个小型的 ConvNet 用于生成和推断网络。由于这些神经网络较小,我们使用 [`tf.keras.Sequential`](https://tensorflow.google.cn/api_docs/python/tf/keras/Sequential) 来简化代码。在下面的描述中,令 $x$ 和 $z$ 分别表示观测值和潜在变量
### 生成网络
这里定义了生成模型,该模型将潜在编码作为输入,并输出用于观测条件分布的参数,即 $p(x|z)$。另外,我们对潜在变量使用单位高斯先验 $p(z)$。
### 推理网络
这里定义了近似后验分布 $q(z|x)$,该后验分布以观测值作为输入,并输出用于潜在表示的条件分布的一组参数。在本示例中,我们仅将此分布建模为对角高斯模型。在这种情况下,推断网络将输出因式分解的高斯均值和对数方差参数(为了数值稳定性使用对数方差而不是直接使用方差)。
### 重参数化技巧
在优化过程中,我们可以从 $q(z|x)$ 中采样,方法是首先从单位高斯采样,然后乘以标准偏差并加平均值。这样可以确保梯度能够通过样本传递到推理网络参数。
### 网络架构
对于推理网络,我们使用两个卷积层,后接一个全连接层。在生成网络中,我们通过使用全连接层,后接三个卷积转置层(在某些情况下也称为反卷积层)来镜像词体系结构。请注意,在训练 VAE 时避免使用批归一化batch normalization是一种常见的做法因为使用小批量处理会导致额外的随机性从而加剧随机抽样的不稳定性。
```py
class CVAE(tf.keras.Model):
def __init__(self, latent_dim):
super(CVAE, self).__init__()
self.latent_dim = latent_dim
self.inference_net = tf.keras.Sequential(
[
tf.keras.layers.InputLayer(input_shape=(28, 28, 1)),
tf.keras.layers.Conv2D(
filters=32, kernel_size=3, strides=(2, 2), activation='relu'),
tf.keras.layers.Conv2D(
filters=64, kernel_size=3, strides=(2, 2), activation='relu'),
tf.keras.layers.Flatten(),
# No activation
tf.keras.layers.Dense(latent_dim + latent_dim),
]
)
self.generative_net = tf.keras.Sequential(
[
tf.keras.layers.InputLayer(input_shape=(latent_dim,)),
tf.keras.layers.Dense(units=7*7*32, activation=tf.nn.relu),
tf.keras.layers.Reshape(target_shape=(7, 7, 32)),
tf.keras.layers.Conv2DTranspose(
filters=64,
kernel_size=3,
strides=(2, 2),
padding="SAME",
activation='relu'),
tf.keras.layers.Conv2DTranspose(
filters=32,
kernel_size=3,
strides=(2, 2),
padding="SAME",
activation='relu'),
# No activation
tf.keras.layers.Conv2DTranspose(
filters=1, kernel_size=3, strides=(1, 1), padding="SAME"),
]
)
@tf.function
def sample(self, eps=None):
if eps is None:
eps = tf.random.normal(shape=(100, self.latent_dim))
return self.decode(eps, apply_sigmoid=True)
def encode(self, x):
mean, logvar = tf.split(self.inference_net(x), num_or_size_splits=2, axis=1)
return mean, logvar
def reparameterize(self, mean, logvar):
eps = tf.random.normal(shape=mean.shape)
return eps * tf.exp(logvar * .5) + mean
def decode(self, z, apply_sigmoid=False):
logits = self.generative_net(z)
if apply_sigmoid:
probs = tf.sigmoid(logits)
return probs
return logits
```
## 定义损失函数和优化器
VAE 通过最大化边际对数似然的证据下界ELBO进行训练
$$\log p(x) \ge \text{ELBO} = \mathbb{E}_{q(z|x)}\left[\log \frac{p(x, z)}{q(z|x)}\right].$$
实际上,我们优化了此期望的单样本蒙卡特罗估计:
$$\log p(x| z) + \log p(z) - \log q(z|x),$$
其中 $z$ 从 $q(z|x)$ 中采样。
**注意**:我们也可以分析性地计算 KL 项,但简单起见,这里我们将所有三个项合并到蒙卡特罗估计器中。
```py
optimizer = tf.keras.optimizers.Adam(1e-4)
def log_normal_pdf(sample, mean, logvar, raxis=1):
log2pi = tf.math.log(2\. * np.pi)
return tf.reduce_sum(
-.5 * ((sample - mean) ** 2\. * tf.exp(-logvar) + logvar + log2pi),
axis=raxis)
@tf.function
def compute_loss(model, x):
mean, logvar = model.encode(x)
z = model.reparameterize(mean, logvar)
x_logit = model.decode(z)
cross_ent = tf.nn.sigmoid_cross_entropy_with_logits(logits=x_logit, labels=x)
logpx_z = -tf.reduce_sum(cross_ent, axis=[1, 2, 3])
logpz = log_normal_pdf(z, 0., 0.)
logqz_x = log_normal_pdf(z, mean, logvar)
return -tf.reduce_mean(logpx_z + logpz - logqz_x)
@tf.function
def compute_apply_gradients(model, x, optimizer):
with tf.GradientTape() as tape:
loss = compute_loss(model, x)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
```
## 训练
* 我们从迭代数据集开始
* 在每次迭代期间,我们将图像传递给编码器,以获得近似后验 $q(z|x)$ 的一组均值和对数方差参数
* 然后,我们应用 *重参数化技巧* 从 $q(z|x)$ 中采样
* 最后,我们将重新参数化的样本传递给解码器,以获取生成分布 $p(x|z)$ 的 logit
* **注意:**由于我们使用的是由 keras 加载的数据集,其中训练集中有 6 万个数据点,测试集中有 1 万个数据点,因此我们在测试集上的最终 ELBO 略高于对 Larochelle 版 MNIST 使用动态二值化的文献中的报告结果。
## 生成图片
* 进行训练后,可以生成一些图片了
* 我们首先从单位高斯先验分布 $p(z)$ 中采样一组潜在向量
* 随后生成器将潜在样本 $z$ 转换为观测值的 logit得到分布 $p(x|z)$
* 这里我们画出伯努利分布的概率
```py
epochs = 100
latent_dim = 50
num_examples_to_generate = 16
# 保持随机向量恒定以进行生成(预测),以便更易于看到改进。
random_vector_for_generation = tf.random.normal(
shape=[num_examples_to_generate, latent_dim])
model = CVAE(latent_dim)
```
```py
def generate_and_save_images(model, epoch, test_input):
predictions = model.sample(test_input)
fig = plt.figure(figsize=(4,4))
for i in range(predictions.shape[0]):
plt.subplot(4, 4, i+1)
plt.imshow(predictions[i, :, :, 0], cmap='gray')
plt.axis('off')
# tight_layout 最小化两个子图之间的重叠
plt.savefig('image_at_epoch_{:04d}.png'.format(epoch))
plt.show()
```
```py
generate_and_save_images(model, 0, random_vector_for_generation)
for epoch in range(1, epochs + 1):
start_time = time.time()
for train_x in train_dataset:
compute_apply_gradients(model, train_x, optimizer)
end_time = time.time()
if epoch % 1 == 0:
loss = tf.keras.metrics.Mean()
for test_x in test_dataset:
loss(compute_loss(model, test_x))
elbo = -loss.result()
display.clear_output(wait=False)
print('Epoch: {}, Test set ELBO: {}, '
'time elapse for current epoch {}'.format(epoch,
elbo,
end_time - start_time))
generate_and_save_images(
model, epoch, random_vector_for_generation)
```
```py
Epoch: 100, Test set ELBO: -77.80061340332031, time elapse for current epoch 1.6898043155670166
```
![png](img/25c5372b82b31daf5535e4f1571434a9.png)
### 使用 epoch 编号显示图片
```py
def display_image(epoch_no):
return PIL.Image.open('image_at_epoch_{:04d}.png'.format(epoch_no))
```
```py
plt.imshow(display_image(epochs))
plt.axis('off')# 显示图片
```
```py
(-0.5, 287.5, 287.5, -0.5)
```
![png](img/74d6d6302722b19888cd2b8a076a9899.png)
### 生成所有保存图片的 GIF
```py
anim_file = 'cvae.gif'
with imageio.get_writer(anim_file, mode='I') as writer:
filenames = glob.glob('image*.png')
filenames = sorted(filenames)
last = -1
for i,filename in enumerate(filenames):
frame = 2*(i**0.5)
if round(frame) > round(last):
last = frame
else:
continue
image = imageio.imread(filename)
writer.append_data(image)
image = imageio.imread(filename)
writer.append_data(image)
import IPython
if IPython.version_info >= (6,2,0,''):
display.Image(filename=anim_file)
```
如果您正使用 Colab您可以使用以下代码下载动画。
```py
try:
from google.colab import files
except ImportError:
pass
else:
files.download(anim_file)
```

View File

@@ -1 +0,0 @@
# 可解释性

View File

@@ -1,600 +0,0 @@
# Integrated gradients
> 原文:[https://tensorflow.google.cn/tutorials/interpretability/integrated_gradients](https://tensorflow.google.cn/tutorials/interpretability/integrated_gradients)
<devsite-mathjax config="TeX-AMS-MML_SVG"></devsite-mathjax>
This tutorial demonstrates how to implement **Integrated Gradients (IG)**, an [Explainable AI](https://en.wikipedia.org/wiki/Explainable_artificial_intelligence) technique introduced in the paper [Axiomatic Attribution for Deep Networks](https://arxiv.org/abs/1703.01365). IG aims to explain the relationship between a model's predictions in terms of its features. It has many use cases including understanding feature importances, identifying data skew, and debugging model performance.
IG has become a popular interpretability technique due to its broad applicability to any differentiable model (e.g. images, text, structured data), ease of implementation, theoretical justifications, and computational efficiency relative to alternative approaches that allows it to scale to large networks and feature spaces such as images.
In this tutorial, you will walk through an implementation of IG step-by-step to understand the pixel feature importances of an image classifier. As an example, consider this [image](https://commons.wikimedia.org/wiki/File:San_Francisco_fireboat_showing_off.jpg) of a fireboat spraying jets of water. You would classify this image as a fireboat and might highlight the pixels making up the boat and water cannons as being important to your decision. Your model will also classify this image as a fireboat later on in this tutorial; however, does it highlight the same pixels as important when explaining its decision?
In the images below titled "IG Attribution Mask" and "Original + IG Mask Overlay" you can see that your model instead highlights (in purple) the pixels comprising the boat's water cannons and jets of water as being more important than the boat itself to its decision. How will your model generalize to new fireboats? What about fireboats without water jets? Read on to learn more about how IG works and how to apply IG to your models to better understand the relationship between their predictions and underlying features.
![Output Image 1](img/8350c367e4679800cd155cf00a343b47.png)
## Setup
```py
import matplotlib.pylab as plt
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
```
### Download a pretrained image classifier from TF-Hub
IG can be applied to any differentiable model. In the spirit of the original paper, you will use a pre-trained version of the same model, Inception V1, which you will download from [TensorFlow Hub](https://hub.tensorflow.google.cn/google/imagenet/inception_v1/classification/4).
```py
model = tf.keras.Sequential([
hub.KerasLayer(
name='inception_v1',
handle='https://hub.tensorflow.google.cn/google/imagenet/inception_v1/classification/4',
trainable=False),
])
model.build([None, 224, 224, 3])
model.summary()
```
```py
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
inception_v1 (KerasLayer) (None, 1001) 6633209
=================================================================
Total params: 6,633,209
Trainable params: 0
Non-trainable params: 6,633,209
_________________________________________________________________
```
From the module page, you need to keep in mind the following about Inception V1:
**Inputs**: The expected input shape for the model is `(None, 224, 224, 3)`. This is a dense 4D tensor of dtype float32 and shape `(batch_size, height, width, RGB channels)` whose elements are RGB color values of pixels normalized to the range [0, 1]. The first element is `None` to indicate that the model can take any integer batch size.
**Outputs**: A [`tf.Tensor`](https://tensorflow.google.cn/api_docs/python/tf/Tensor) of logits in the shape of `(batch_size, 1001)`. Each row represents the model's predicted score for each of 1,001 classes from ImageNet. For the model's top predicted class index you can use `tf.argmax(predictions, axis=-1)`. Furthermore, you can also convert the model's logit output to predicted probabilities across all classes using `tf.nn.softmax(predictions, axis=-1)` to quantify the model's uncertainty as well as explore similar predicted classes for debugging.
```py
def load_imagenet_labels(file_path):
labels_file = tf.keras.utils.get_file('ImageNetLabels.txt', file_path)
with open(labels_file) as reader:
f = reader.read()
labels = f.splitlines()
return np.array(labels)
```
```py
imagenet_labels = load_imagenet_labels('https://storage.googleapis.com/download.tensorflow.org/data/ImageNetLabels.txt')
```
### Load and preprocess images with [`tf.image`](https://tensorflow.google.cn/api_docs/python/tf/image)
You will illustrate IG using two images from [Wikimedia Commons](https://commons.wikimedia.org/wiki/Main_Page): a [Fireboat](https://commons.wikimedia.org/wiki/File:San_Francisco_fireboat_showing_off.jpg), and a [Giant Panda](https://commons.wikimedia.org/wiki/File:Giant_Panda_2.JPG).
```py
def read_image(file_name):
image = tf.io.read_file(file_name)
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize_with_pad(image, target_height=224, target_width=224)
return image
```
```py
img_url = {
'Fireboat': 'http://storage.googleapis.com/download.tensorflow.org/example_images/San_Francisco_fireboat_showing_off.jpg',
'Giant Panda': 'http://storage.googleapis.com/download.tensorflow.org/example_images/Giant_Panda_2.jpeg',
}
img_paths = {name: tf.keras.utils.get_file(name, url) for (name, url) in img_url.items()}
img_name_tensors = {name: read_image(img_path) for (name, img_path) in img_paths.items()}
```
```py
Downloading data from http://storage.googleapis.com/download.tensorflow.org/example_images/San_Francisco_fireboat_showing_off.jpg
3956736/3954129 [==============================] - 0s 0us/step
Downloading data from http://storage.googleapis.com/download.tensorflow.org/example_images/Giant_Panda_2.jpeg
811008/802859 [==============================] - 0s 0us/step
```
```py
plt.figure(figsize=(8, 8))
for n, (name, img_tensors) in enumerate(img_name_tensors.items()):
ax = plt.subplot(1, 2, n+1)
ax.imshow(img_tensors)
ax.set_title(name)
ax.axis('off')
plt.tight_layout()
```
![png](img/e68189c9da69b7848e9033d29a0dc574.png)
### Classify images
Let's start by classifying these images and displaying the top 3 most confident predictions. Following is a utility function to retrieve the top k predicted labels and probabilities.
```py
def top_k_predictions(img, k=3):
image_batch = tf.expand_dims(img, 0)
predictions = model(image_batch)
probs = tf.nn.softmax(predictions, axis=-1)
top_probs, top_idxs = tf.math.top_k(input=probs, k=k)
top_labels = imagenet_labels[tuple(top_idxs)]
return top_labels, top_probs[0]
```
```py
for (name, img_tensor) in img_name_tensors.items():
plt.imshow(img_tensor)
plt.title(name, fontweight='bold')
plt.axis('off')
plt.show()
pred_label, pred_prob = top_k_predictions(img_tensor)
for label, prob in zip(pred_label, pred_prob):
print(f'{label}: {prob:0.1%}')
```
![png](img/518bc2d08038969576066eb381910cc1.png)
```py
fireboat: 32.6%
pier: 12.7%
suspension bridge: 5.7%
```
![png](img/fecda9bde6f4c7551c164dc066491cb5.png)
```py
giant panda: 89.4%
teddy: 0.3%
gibbon: 0.3%
```
## Calculate Integrated Gradients
Your model, Inception V1, is a learned function that describes a mapping between your input feature space, image pixel values, and an output space defined by ImageNet class probability values between 0 and 1\. Early interpretability methods for neural networks assigned feature importance scores using gradients, which tell you which pixels have the steepest local relative to your model's prediction at a given point along your model's prediction function. However, gradients only describe *local* changes in your model's prediction function with respect to pixel values and do not fully describe your entire model prediction function. As your model fully "learns" the relationship between the range of an individual pixel and the correct ImageNet class, the gradient for this pixel will *saturate*, meaning become increasingly small and even go to zero. Consider the simple model function below:
```py
def f(x):
"""A simplified model function."""
return tf.where(x < 0.8, x, 0.8)
def interpolated_path(x):
"""A straight line path."""
return tf.zeros_like(x)
x = tf.linspace(start=0.0, stop=1.0, num=6)
y = f(x)
```
```py
fig = plt.figure(figsize=(12, 5))
ax0 = fig.add_subplot(121)
ax0.plot(x, f(x), marker='o')
ax0.set_title('Gradients saturate over F(x)', fontweight='bold')
ax0.text(0.2, 0.5, 'Gradients > 0 = \n x is important')
ax0.text(0.7, 0.85, 'Gradients = 0 \n x not important')
ax0.set_yticks(tf.range(0, 1.5, 0.5))
ax0.set_xticks(tf.range(0, 1.5, 0.5))
ax0.set_ylabel('F(x) - model true class predicted probability')
ax0.set_xlabel('x - (pixel value)')
ax1 = fig.add_subplot(122)
ax1.plot(x, f(x), marker='o')
ax1.plot(x, interpolated_path(x), marker='>')
ax1.set_title('IG intuition', fontweight='bold')
ax1.text(0.25, 0.1, 'Accumulate gradients along path')
ax1.set_ylabel('F(x) - model true class predicted probability')
ax1.set_xlabel('x - (pixel value)')
ax1.set_yticks(tf.range(0, 1.5, 0.5))
ax1.set_xticks(tf.range(0, 1.5, 0.5))
ax1.annotate('Baseline', xy=(0.0, 0.0), xytext=(0.0, 0.2),
arrowprops=dict(facecolor='black', shrink=0.1))
ax1.annotate('Input', xy=(1.0, 0.0), xytext=(0.95, 0.2),
arrowprops=dict(facecolor='black', shrink=0.1))
plt.show();
```
![png](img/6d8da708f09878fc993e75adb40fd2a1.png)
* **left**: Your model's gradients for pixel `x` are positive between 0.0 and 0.8 but go to 0.0 between 0.8 and 1.0\. Pixel `x` clearly has a significant impact on pushing your model toward 80% predicted probability on the true class. *Does it make sense that pixel `x`'s importance is small or discontinuous?*
* **right**: The intuition behind IG is to accumulate pixel `x`'s local gradients and attribute its importance as a score for how much it adds or subtracts to your model's overall output class probability. You can break down and compute IG in 3 parts:
1. interpolate small steps along a straight line in the feature space between 0 (a baseline or starting point) and 1 (input pixel's value)
2. compute gradients at each step between your model's predictions with respect to each step
3. approximate the integral between your baseline and input by accumulating (cumulative average) these local gradients.
To reinforce this intuition, you will walk through these 3 parts by applying IG to the example "Fireboat" image below.
### Establish a baseline
A baseline is an input image used as a starting point for calculating feature importance. Intuitively, you can think of the baseline's explanatory role as representing the impact of the absence of each pixel on the "Fireboat" prediction to contrast with its impact of each pixel on the "Fireboat" prediction when present in the input image. As a result, the choice of the baseline plays a central role in interpreting and visualizing pixel feature importances. For additional discussion of baseline selection, see the resources in the "Next steps" section at the bottom of this tutorial. Here, you will use a black image whose pixel values are all zero.
Other choices you could experiment with include an all white image, or a random image, which you can create with `tf.random.uniform(shape=(224,224,3), minval=0.0, maxval=1.0)`.
```py
baseline = tf.zeros(shape=(224,224,3))
```
```py
plt.imshow(baseline)
plt.title("Baseline")
plt.axis('off')
plt.show()
```
![png](img/3e1bc64db4c260d2327ca5a9defae306.png)
### Unpack formulas into code
The formula for Integrated Gradients is as follows:
$IntegratedGradients_{i}(x) ::= (x_{i} - x'_{i})\times\int_{\alpha=0}^1\frac{\partial F(x'+\alpha \times (x - x'))}{\partial x_i}{d\alpha}$
where:
$_{i}$ = feature
$x$ = input
$x'$ = baseline
$\alpha$ = interpolation constant to perturbe features by
In practice, computing a definite integral is not always numerically possible and can be computationally costly, so you compute the following numerical approximation:
$IntegratedGrads^{approx}_{i}(x)::=(x_{i}-x'_{i})\times\sum_{k=1}^{m}\frac{\partial F(x' + \frac{k}{m}\times(x - x'))}{\partial x_{i} } \times \frac{1}{m}$
where:
$_{i}$ = feature (individual pixel)
$x$ = input (image tensor)
$x'$ = baseline (image tensor)
$k$ = scaled feature perturbation constant
$m$ = number of steps in the Riemann sum approximation of the integral
$(x_{i}-x'_{i})$ = a term for the difference from the baseline. This is necessary to scale the integrated gradients and keep them in terms of the original image. The path from the baseline image to the input is in pixel space. Since with IG you are integrating in a straight line (linear transformation) this ends up being roughly equivalent to the integral term of the derivative of the interpolated image function with respect to $\alpha$ with enough steps. The integral sums each pixel's gradient times the change in the pixel along the path. It's simpler to implement this integration as uniform steps from one image to the other, substituting $x := (x' + \alpha(x-x'))$. So the change of variables gives $dx = (x-x')d\alpha$. The $(x-x')$ term is constant and is factored out of the integral.
### Interpolate images
$IntegratedGrads^{approx}_{i}(x)::=(x_{i}-x'_{i})\times\sum_{k=1}^{m}\frac{\partial F(\overbrace{x' + \frac{k}{m}\times(x - x')}^\text{interpolate m images at k intervals})}{\partial x_{i} } \times \frac{1}{m}$
First, you will generate a [linear interpolation](https://en.wikipedia.org/wiki/Linear_interpolation) between the baseline and the original image. You can think of interpolated images as small steps in the feature space between your baseline and input, represented by $\alpha$ in the original equation.
```py
m_steps=50
alphas = tf.linspace(start=0.0, stop=1.0, num=m_steps+1) # Generate m_steps intervals for integral_approximation() below.
```
```py
def interpolate_images(baseline,
image,
alphas):
alphas_x = alphas[:, tf.newaxis, tf.newaxis, tf.newaxis]
baseline_x = tf.expand_dims(baseline, axis=0)
input_x = tf.expand_dims(image, axis=0)
delta = input_x - baseline_x
images = baseline_x + alphas_x * delta
return images
```
Let's use the above function to generate interpolated images along a linear path at alpha intervals between a black baseline image and the example "Fireboat" image.
```py
interpolated_images = interpolate_images(
baseline=baseline,
image=img_name_tensors['Fireboat'],
alphas=alphas)
```
Let's visualize the interpolated images. Note: another way of thinking about the $\alpha$ constant is that it is consistently increasing each interpolated image's intensity.
```py
fig = plt.figure(figsize=(20, 20))
i = 0
for alpha, image in zip(alphas[0::10], interpolated_images[0::10]):
i += 1
plt.subplot(1, len(alphas[0::10]), i)
plt.title(f'alpha: {alpha:.1f}')
plt.imshow(image)
plt.axis('off')
plt.tight_layout();
```
![png](img/e2e6d59bb8ebd47a957558d11e836ec1.png)
### Compute gradients
Now let's take a look at how to calculate gradients in order to measure the relationship between changes to a feature and changes in the model's predictions. In the case of images, the gradient tells us which pixels have the strongest effect on the models predicted class probabilities.
$IntegratedGrads^{approx}_{i}(x)::=(x_{i}-x'_{i})\times\sum_{k=1}^{m}\frac{\overbrace{\partial F(\text{interpolated images})}^\text{compute gradients} }{\partial x_{i} } \times \frac{1}{m}$
where:
$F()$ = your model's prediction function
$\frac{\partial{F} }{\partial{x_i} }$ = gradient (vector of partial derivatives $\partial$) of your model F's prediction function relative to each feature $x_i$
TensorFlow makes computing gradients easy for you with a [`tf.GradientTape`](https://tensorflow.google.cn/api_docs/python/tf/GradientTape).
```py
def compute_gradients(images, target_class_idx):
with tf.GradientTape() as tape:
tape.watch(images)
logits = model(images)
probs = tf.nn.softmax(logits, axis=-1)[:, target_class_idx]
return tape.gradient(probs, images)
```
Let's compute the gradients for each image along the interpolation path with respect to the correct output. Recall that your model returns a `(1, 1001)` shaped `Tensor` with logits that you convert to predicted probabilities for each class. You need to pass the correct ImageNet target class index to the `compute_gradients` function for your image.
```py
path_gradients = compute_gradients(
images=interpolated_images,
target_class_idx=555)
```
Note the output shape of `(n_interpolated_images, img_height, img_width, RGB)`, which gives us the gradient for every pixel of every image along the interpolation path. You can think of these gradients as measuring the change in your model's predictions for each small step in the feature space.
```py
print(path_gradients.shape)
```
```py
(51, 224, 224, 3)
```
**Visualizing gradient saturation**
Recall that the gradients you just calculated above describe *local* changes to your model's predicted probability of "Fireboat" and can *saturate*.
These concepts are visualized using the gradients you calculated above in the 2 plots below.
```py
pred = model(interpolated_images)
pred_proba = tf.nn.softmax(pred, axis=-1)[:, 555]
plt.figure(figsize=(10, 4))
ax1 = plt.subplot(1, 2, 1)
ax1.plot(alphas, pred_proba)
ax1.set_title('Target class predicted probability over alpha')
ax1.set_ylabel('model p(target class)')
ax1.set_xlabel('alpha')
ax1.set_ylim([0, 1])
ax2 = plt.subplot(1, 2, 2)
# Average across interpolation steps
average_grads = tf.reduce_mean(path_gradients, axis=[1, 2, 3])
# Normalize gradients to 0 to 1 scale. E.g. (x - min(x))/(max(x)-min(x))
average_grads_norm = (average_grads-tf.math.reduce_min(average_grads))/(tf.math.reduce_max(average_grads)-tf.reduce_min(average_grads))
ax2.plot(alphas, average_grads_norm)
ax2.set_title('Average pixel gradients (normalized) over alpha')
ax2.set_ylabel('Average pixel gradients')
ax2.set_xlabel('alpha')
ax2.set_ylim([0, 1]);
```
```py
(0.0, 1.0)
```
![png](img/0b0835e78f54f2c464c9df77cfe6a93b.png)
* **left**: This plot shows how your model's confidence in the "Fireboat" class varies across alphas. Notice how the gradients, or slope of the line, largely flattens or saturates between 0.6 and 1.0 before settling at the final "Fireboat" predicted probability of about 40%.
* **right**: The right plot shows the average gradients magnitudes over alpha more directly. Note how the values sharply approach and even briefly dip below zero. In fact, your model "learns" the most from gradients at lower values of alpha before saturating. Intuitively, you can think of this as your model has learned the pixels e.g. water cannons to make the correct prediction, sending these pixels gradients to zero, but is still quite uncertain and focused on spurious bridge or water jet pixels as the alpha values approach the original input image.
To make sure these important water cannon pixels are reflected as important to the "Fireboat" prediction, you will continue on below to learn how to accumulate these gradients to accurately approximate how each pixel impacts your "Fireboat" predicted probability.
### Accumulate gradients (integral approximation)
There are many different ways you can go about computing the numerical approximation of an integral for IG with different tradeoffs in accuracy and convergence across varying functions. A popular class of methods is called [Riemann sums](https://en.wikipedia.org/wiki/Riemann_sum). Here, you will use the Trapezoidal rule (you can find additional code to explore different approximation methods at the end of this tutorial).
$IntegratedGrads^{approx}_{i}(x)::=(x_{i}-x'_{i})\times \overbrace{\sum_{k=1}^{m} }^\text{Sum m local gradients} \text{gradients(interpolated images)} \times \overbrace{\frac{1}{m} }^\text{Divide by m steps}$
From the equation, you can see you are summing over `m` gradients and dividing by `m` steps. You can implement the two operations together for part 3 as an *average of the local gradients of `m` interpolated predictions and input images*.
```py
def integral_approximation(gradients):
# riemann_trapezoidal
grads = (gradients[:-1] + gradients[1:]) / tf.constant(2.0)
integrated_gradients = tf.math.reduce_mean(grads, axis=0)
return integrated_gradients
```
The `integral_approximation` function takes the gradients of the predicted probability of the target class with respect to the interpolated images between the baseline and the original image.
```py
ig = integral_approximation(
gradients=path_gradients)
```
You can confirm averaging across the gradients of `m` interpolated images returns an integrated gradients tensor with the same shape as the original "Giant Panda" image.
```py
print(ig.shape)
```
```py
(224, 224, 3)
```
### Putting it all together
Now you will combine the 3 previous general parts together into an `IntegratedGradients` function and utilize a [@tf.function](https://tensorflow.google.cn/guide/function) decorator to compile it into a high performance callable TensorFlow graph. This is implemented as 5 smaller steps below:
$IntegratedGrads^{approx}_{i}(x)::=\overbrace{(x_{i}-x'_{i})}^\text{5.}\times \overbrace{\sum_{k=1}^{m} }^\text{4.} \frac{\partial \overbrace{F(\overbrace{x' + \overbrace{\frac{k}{m} }^\text{1.}\times(x - x'))}^\text{2.} }^\text{3.} }{\partial x_{i} } \times \overbrace{\frac{1}{m} }^\text{4.}$
1. Generate alphas $\alpha$
2. Generate interpolated images = $(x' + \frac{k}{m}\times(x - x'))$
3. Compute gradients between model $F$ output predictions with respect to input features = $\frac{\partial F(\text{interpolated path inputs})}{\partial x_{i} }$
4. Integral approximation through averaging gradients = $\sum_{k=1}^m \text{gradients} \times \frac{1}{m}$
5. Scale integrated gradients with respect to original image = $(x_{i}-x'_{i}) \times \text{integrated gradients}$. The reason this step is necessary is to make sure that the attribution values accumulated across multiple interpolated images are in the same units and faithfully represent the pixel importances on the original image.
```py
@tf.function
def integrated_gradients(baseline,
image,
target_class_idx,
m_steps=50,
batch_size=32):
# 1\. Generate alphas.
alphas = tf.linspace(start=0.0, stop=1.0, num=m_steps+1)
# Initialize TensorArray outside loop to collect gradients.
gradient_batches = tf.TensorArray(tf.float32, size=m_steps+1)
# Iterate alphas range and batch computation for speed, memory efficiency, and scaling to larger m_steps.
for alpha in tf.range(0, len(alphas), batch_size):
from_ = alpha
to = tf.minimum(from_ + batch_size, len(alphas))
alpha_batch = alphas[from_:to]
# 2\. Generate interpolated inputs between baseline and input.
interpolated_path_input_batch = interpolate_images(baseline=baseline,
image=image,
alphas=alpha_batch)
# 3\. Compute gradients between model outputs and interpolated inputs.
gradient_batch = compute_gradients(images=interpolated_path_input_batch,
target_class_idx=target_class_idx)
# Write batch indices and gradients to extend TensorArray.
gradient_batches = gradient_batches.scatter(tf.range(from_, to), gradient_batch)
# Stack path gradients together row-wise into single tensor.
total_gradients = gradient_batches.stack()
# 4\. Integral approximation through averaging gradients.
avg_gradients = integral_approximation(gradients=total_gradients)
# 5\. Scale integrated gradients with respect to input.
integrated_gradients = (image - baseline) * avg_gradients
return integrated_gradients
```
```py
ig_attributions = integrated_gradients(baseline=baseline,
image=img_name_tensors['Fireboat'],
target_class_idx=555,
m_steps=240)
```
Again, you can check that the IG feature attributions have the same shape as the input "Fireboat" image.
```py
print(ig_attributions.shape)
```
```py
(224, 224, 3)
```
The paper suggests the number of steps to range between 20 to 300 depending upon the example (although in practice this can be higher in the 1,000s to accurately approximate the integral). You can find additional code to check for the appropriate number of steps in the "Next steps" resources at the end of this tutorial.
### Visualize attributions
You are ready to visualize attributions, and overlay them on the original image. The code below sums the absolute values of the integrated gradients across the color channels to produce an attribution mask. This plotting method captures the relative impact of pixels on the model's predictions.
```py
def plot_img_attributions(baseline,
image,
target_class_idx,
m_steps=50,
cmap=None,
overlay_alpha=0.4):
attributions = integrated_gradients(baseline=baseline,
image=image,
target_class_idx=target_class_idx,
m_steps=m_steps)
# Sum of the attributions across color channels for visualization.
# The attribution mask shape is a grayscale image with height and width
# equal to the original image.
attribution_mask = tf.reduce_sum(tf.math.abs(attributions), axis=-1)
fig, axs = plt.subplots(nrows=2, ncols=2, squeeze=False, figsize=(8, 8))
axs[0, 0].set_title('Baseline image')
axs[0, 0].imshow(baseline)
axs[0, 0].axis('off')
axs[0, 1].set_title('Original image')
axs[0, 1].imshow(image)
axs[0, 1].axis('off')
axs[1, 0].set_title('Attribution mask')
axs[1, 0].imshow(attribution_mask, cmap=cmap)
axs[1, 0].axis('off')
axs[1, 1].set_title('Overlay')
axs[1, 1].imshow(attribution_mask, cmap=cmap)
axs[1, 1].imshow(image, alpha=overlay_alpha)
axs[1, 1].axis('off')
plt.tight_layout()
return fig
```
Looking at the attributions on the "Fireboat" image, you can see the model identifies the water cannons and spouts as contributing to its correct prediction.
```py
_ = plot_img_attributions(image=img_name_tensors['Fireboat'],
baseline=baseline,
target_class_idx=555,
m_steps=240,
cmap=plt.cm.inferno,
overlay_alpha=0.4)
```
![png](img/29af5825a7303165115c9cfbc59ae606.png)
On the "Giant Panda" image, the attributions highlight the texture, nose, and the fur of the Panda's face.
```py
_ = plot_img_attributions(image=img_name_tensors['Giant Panda'],
baseline=baseline,
target_class_idx=389,
m_steps=55,
cmap=plt.cm.viridis,
overlay_alpha=0.5)
```
![png](img/07f89687b786f68c1561b81ac448c45e.png)
## Uses and limitations
Use cases
* Employing techniques like Integrated Gradients before deploying your model can help you develop intuition for how and why it works. Do the features highlighted by this technique match your intuition? If not, that may be indicative of a bug in your model or dataset, or overfitting.
Limitations
* Integrated Gradients provides feature importances on individual examples, however, it does not provide global feature importances across an entire dataset.
* Integrated Gradients provides individual feature importances, but it does not explain feature interactions and combinations.
## Next steps
This tutorial presented a basic implementation of Integrated Gradients. As a next step, you can use this notebook to try this technique with different models and images yourself.
For interested readers, there is a lengthier version of this tutorial (which includes code for different baselines, to compute integral approximations, and to determine a sufficient number of steps) which you can find [here](https://github.com/GoogleCloudPlatform/training-data-analyst/tree/master/blogs/integrated_gradients).
To deepen your understanding, check out the paper [Axiomatic Attribution for Deep Networks](https://arxiv.org/abs/1703.01365) and [Github repository](https://github.com/ankurtaly/Integrated-Gradients), which contains an implementation in a previous version of TensorFlow. You can also explore feature attribution, and the impact of different baselines, on [distill.pub](https://distill.pub/2020/attribution-baselines/).
Interested in incorporating IG into your production machine learning workflows for feature importances, model error analysis, and data skew monitoring? Check out Google Cloud's [Explainable AI](https://cloud.google.com/explainable-ai) product that supports IG attributions. The Google AI PAIR research group also open-sourced the [What-if tool](https://pair-code.github.io/what-if-tool/index.html#about) which can be used for model debugging, including visualizing IG feature attributions.

View File

@@ -1 +0,0 @@
# 强化学习

View File

@@ -1,476 +0,0 @@
# Playing CartPole with the Actor-Critic Method
> 原文:[https://tensorflow.google.cn/tutorials/reinforcement_learning/actor_critic](https://tensorflow.google.cn/tutorials/reinforcement_learning/actor_critic)
<devsite-mathjax config="TeX-AMS-MML_SVG"></devsite-mathjax>
This tutorial demonstrates how to implement the [Actor-Critic](https://papers.nips.cc/paper/1786-actor-critic-algorithms.pdf) method using TensorFlow to train an agent on the [Open AI Gym](https://gym.openai.com/) CartPole-V0 environment. The reader is assumed to have some familiarity with [policy gradient methods](https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf) of reinforcement learning.
**Actor-Critic methods**
Actor-Critic methods are [temporal difference (TD) learning](https://en.wikipedia.org/wiki/Temporal_difference_learning) methods that represent the policy function independent of the value function.
A policy function (or policy) returns a probability distribution over actions that the agent can take based on the given state. A value function determines the expected return for an agent starting at a given state and acting according to a particular policy forever after.
In the Actor-Critic method, the policy is referred to as the *actor* that proposes a set of possible actions given a state, and the estimated value function is referred to as the *critic*, which evaluates actions taken by the *actor* based on the given policy.
In this tutorial, both the *Actor* and *Critic* will be represented using one neural network with two outputs.
**CartPole-v0**
In the [CartPole-v0 environment](https://gym.openai.com/envs/CartPole-v0), a pole is attached to a cart moving along a frictionless track. The pole starts upright and the goal of the agent is to prevent it from falling over by applying a force of -1 or +1 to the cart. A reward of +1 is given for every time step the pole remains upright. An episode ends when (1) the pole is more than 15 degrees from vertical or (2) the cart moves more than 2.4 units from the center.
<center>
<figure>![](/tutorials/reinforcement_learning/images/cartpole-v0.gif)
<figcaption>Trained actor-critic model in Cartpole-v0 environment</figcaption>
</figure>
</center>
The problem is considered "solved" when the average total reward for the episode reaches 195 over 100 consecutive trials.
## Setup
Import necessary packages and configure global settings.
```py
pip install -q gym
```
```py
WARNING: You are using pip version 20.2.2; however, version 20.2.3 is available.
You should consider upgrading via the '/tmpfs/src/tf_docs_env/bin/python -m pip install --upgrade pip' command.
```
```py
# Install additional packages for visualization
sudo apt-get install -y xvfb python-opengl > /dev/null 2>&1
pip install -q pyvirtualdisplay > /dev/null 2>&1
pip install -q git+https://github.com/tensorflow/docs > /dev/null 2>&1
```
```py
import collections
import gym
import numpy as np
import tensorflow as tf
import tqdm
from matplotlib import pyplot as plt
from tensorflow.keras import layers
from typing import Any, List, Sequence, Tuple
# Create the environment
env = gym.make("CartPole-v0")
# Set seed for experiment reproducibility
seed = 42
env.seed(seed)
tf.random.set_seed(seed)
np.random.seed(seed)
# Small epsilon value for stabilizing division operations
eps = np.finfo(np.float32).eps.item()
```
## Model
The *Actor* and *Critic* will be modeled using one neural network that generates the action probabilities and critic value respectively. We use model subclassing to define the model.
During the forward pass, the model will take in the state as the input and will output both action probabilities and critic value $V$, which models the state-dependent [value function](https://spinningup.openai.com/en/latest/spinningup/rl_intro.html#value-functions). The goal is to train a model that chooses actions based on a policy $\pi$ that maximizes expected [return](https://spinningup.openai.com/en/latest/spinningup/rl_intro.html#reward-and-return).
For Cartpole-v0, there are four values representing the state: cart position, cart-velocity, pole angle and pole velocity respectively. The agent can take two actions to push the cart left (0) and right (1) respectively.
Refer to [OpenAI Gym's CartPole-v0 wiki page](http://www.derongliu.org/adp/adp-cdrom/Barto1983.pdf) for more information.
```py
class ActorCritic(tf.keras.Model):
"""Combined actor-critic network."""
def __init__(
self,
num_actions: int,
num_hidden_units: int):
"""Initialize."""
super().__init__()
self.common = layers.Dense(num_hidden_units, activation="relu")
self.actor = layers.Dense(num_actions)
self.critic = layers.Dense(1)
def call(self, inputs: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]:
x = self.common(inputs)
return self.actor(x), self.critic(x)
```
```py
num_actions = env.action_space.n # 2
num_hidden_units = 128
model = ActorCritic(num_actions, num_hidden_units)
```
## Training
To train the agent, you will follow these steps:
1. Run the agent on the environment to collect training data per episode.
2. Compute expected return at each time step.
3. Compute the loss for the combined actor-critic model.
4. Compute gradients and update network parameters.
5. Repeat 1-4 until either success criterion or max episodes has been reached.
### 1\. Collecting training data
As in supervised learning, in order to train the actor-critic model, we need to have training data. However, in order to collect such data, the model would need to be "run" in the environment.
We collect training data for each episode. Then at each time step, the model's forward pass will be run on the environment's state in order to generate action probabilities and the critic value based on the current policy parameterized by the model's weights.
The next action will be sampled from the action probabilities generated by the model, which would then be applied to the environment, causing the next state and reward to be generated.
This process is implemented in the `run_episode` function, which uses TensorFlow operations so that it can later be compiled into a TensorFlow graph for faster training. Note that [`tf.TensorArray`](https://tensorflow.google.cn/api_docs/python/tf/TensorArray)s were used to support Tensor iteration on variable length arrays.
```py
# Wrap OpenAI Gym's `env.step` call as an operation in a TensorFlow function.
# This would allow it to be included in a callable TensorFlow graph.
def env_step(action: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
"""Returns state, reward and done flag given an action."""
state, reward, done, _ = env.step(action)
return (state.astype(np.float32),
np.array(reward, np.int32),
np.array(done, np.int32))
def tf_env_step(action: tf.Tensor) -> List[tf.Tensor]:
return tf.numpy_function(env_step, [action],
[tf.float32, tf.int32, tf.int32])
```
```py
def run_episode(
initial_state: tf.Tensor,
model: tf.keras.Model,
max_steps: int) -> List[tf.Tensor]:
"""Runs a single episode to collect training data."""
action_probs = tf.TensorArray(dtype=tf.float32, size=0, dynamic_size=True)
values = tf.TensorArray(dtype=tf.float32, size=0, dynamic_size=True)
rewards = tf.TensorArray(dtype=tf.int32, size=0, dynamic_size=True)
initial_state_shape = initial_state.shape
state = initial_state
for t in tf.range(max_steps):
# Convert state into a batched tensor (batch size = 1)
state = tf.expand_dims(state, 0)
# Run the model and to get action probabilities and critic value
action_logits_t, value = model(state)
# Sample next action from the action probability distribution
action = tf.random.categorical(action_logits_t, 1)[0, 0]
action_probs_t = tf.nn.softmax(action_logits_t)
# Store critic values
values = values.write(t, tf.squeeze(value))
# Store log probability of the action chosen
action_probs = action_probs.write(t, action_probs_t[0, action])
# Apply action to the environment to get next state and reward
state, reward, done = tf_env_step(action)
state.set_shape(initial_state_shape)
# Store reward
rewards = rewards.write(t, reward)
if tf.cast(done, tf.bool):
break
action_probs = action_probs.stack()
values = values.stack()
rewards = rewards.stack()
return action_probs, values, rewards
```
### 2\. Computing expected returns
We convert the sequence of rewards for each timestep $t$, ${r_{t}}^{T}_{t=1}$ collected during one episode into a sequence of expected returns ${G_{t}}^{T}_{t=1}$ in which the sum of rewards is taken from the current timestep $t$ to $T$ and each reward is multiplied with an exponentially decaying discount factor $\gamma$:
$$G_{t} = \sum^{T}_{t'=t} \gamma^{t'-t}r_{t'}$$
Since $\gamma\in(0,1)$, rewards further out from the current timestep are given less weight.
Intuitively, expected return simply implies that rewards now are better than rewards later. In a mathematical sense, it is to ensure that the sum of the rewards converges.
To stabilize training, we also standardize the resulting sequence of returns (i.e. to have zero mean and unit standard deviation).
```py
def get_expected_return(
rewards: tf.Tensor,
gamma: float,
standardize: bool = True) -> tf.Tensor:
"""Compute expected returns per timestep."""
n = tf.shape(rewards)[0]
returns = tf.TensorArray(dtype=tf.float32, size=n)
# Start from the end of `rewards` and accumulate reward sums
# into the `returns` array
rewards = tf.cast(rewards[::-1], dtype=tf.float32)
discounted_sum = tf.constant(0.0)
discounted_sum_shape = discounted_sum.shape
for i in tf.range(n):
reward = rewards[i]
discounted_sum = reward + gamma * discounted_sum
discounted_sum.set_shape(discounted_sum_shape)
returns = returns.write(i, discounted_sum)
returns = returns.stack()[::-1]
if standardize:
returns = ((returns - tf.math.reduce_mean(returns)) /
(tf.math.reduce_std(returns) + eps))
return returns
```
### 3\. The actor-critic loss
Since we are using a hybrid actor-critic model, we will use loss function that is a combination of actor and critic losses for training, as shown below:
$$L = L_{actor} + L_{critic}$$
#### Actor loss
We formulate the actor loss based on [policy gradients with the critic as a state dependent baseline](https://www.youtube.com/watch?v=EKqxumCuAAY&t=62m23s) and compute single-sample (per-episode) estimates.
$$L_{actor} = -\sum^{T}_{t=1} log\pi_{\theta}(a_{t} | s_{t})[G(s_{t}, a_{t}) - V^{\pi}_{\theta}(s_{t})]$$
where:
* $T$: the number of timesteps per episode, which can vary per episode
* $s_{t}$: the state at timestep $t$
* $a_{t}$: chosen action at timestep $t$ given state $s$
* $\pi_{\theta}$: is the policy (actor) parameterized by $\theta$
* $V^{\pi}_{\theta}$: is the value function (critic) also parameterized by $\theta$
* $G = G_{t}$: the expected return for a given state, action pair at timestep $t$
We add a negative term to the sum since we want to maximize the probabilities of actions yielding higher rewards by minimizing the combined loss.
##### Advantage
The $G - V$ term in our $L_{actor}$ formulation is called the [advantage](https://spinningup.openai.com/en/latest/spinningup/rl_intro.html#advantage-functions), which indicates how much better an action is given a particular state over a random action selected according to the policy $\pi$ for that state.
While it's possible to exclude a baseline, this may result in high variance during training. And the nice thing about choosing the critic $V$ as a baseline is that it trained to be as close as possible to $G$, leading to a lower variance.
In addition, without the critic, the algorithm would try to increase probabilities for actions taken on a particular state based on expected return, which may not make much of a difference if the relative probabilities between actions remain the same.
For instance, suppose that two actions for a given state would yield the same expected return. Without the critic, the algorithm would try to raise the probability of these actions based on the objective $J$. With the critic, it may turn out that there's no advantage ($G - V = 0$) and thus no benefit gained in increasing the actions' probabilities and the algorithm would set the gradients to zero.
#### Critic loss
Training $V$ to be as close possible to $G$ can be set up as a regression problem with the following loss function:
$$L_{critic} = L_{\delta}(G, V^{\pi}_{\theta})$$
where $L_{\delta}$ is the [Huber loss](https://en.wikipedia.org/wiki/Huber_loss), which is less sensitive to outliers in data than squared-error loss.
```py
huber_loss = tf.keras.losses.Huber(reduction=tf.keras.losses.Reduction.SUM)
def compute_loss(
action_probs: tf.Tensor,
values: tf.Tensor,
returns: tf.Tensor) -> tf.Tensor:
"""Computes the combined actor-critic loss."""
advantage = returns - values
action_log_probs = tf.math.log(action_probs)
actor_loss = -tf.math.reduce_sum(action_log_probs * advantage)
critic_loss = huber_loss(values, returns)
return actor_loss + critic_loss
```
### 4\. Defining the training step to update parameters
We combine all of the steps above into a training step that is run every episode. All steps leading up to the loss function are executed with the [`tf.GradientTape`](https://tensorflow.google.cn/api_docs/python/tf/GradientTape) context to enable automatic differentiation.
We use the Adam optimizer to apply the gradients to the model parameters.
We also compute the sum of the undiscounted rewards, `episode_reward`, in this step which would be used later on to evaluate if we have met the success criterion.
We apply the [`tf.function`](https://tensorflow.google.cn/api_docs/python/tf/function) context to the `train_step` function so that it can be compiled into a callable TensorFlow graph, which can lead to 10x speedup in training.
```py
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)
@tf.function
def train_step(
initial_state: tf.Tensor,
model: tf.keras.Model,
optimizer: tf.keras.optimizers.Optimizer,
gamma: float,
max_steps_per_episode: int) -> tf.Tensor:
"""Runs a model training step."""
with tf.GradientTape() as tape:
# Run the model for one episode to collect training data
action_probs, values, rewards = run_episode(
initial_state, model, max_steps_per_episode)
# Calculate expected returns
returns = get_expected_return(rewards, gamma)
# Convert training data to appropriate TF tensor shapes
action_probs, values, returns = [
tf.expand_dims(x, 1) for x in [action_probs, values, returns]]
# Calculating loss values to update our network
loss = compute_loss(action_probs, values, returns)
# Compute the gradients from the loss
grads = tape.gradient(loss, model.trainable_variables)
# Apply the gradients to the model's parameters
optimizer.apply_gradients(zip(grads, model.trainable_variables))
episode_reward = tf.math.reduce_sum(rewards)
return episode_reward
```
### 5\. Run the training loop
We execute training by run the training step until either the success criterion or maximum number of episodes is reached.
We keep a running record of episode rewards using a queue. Once 100 trials are reached, the oldest reward is removed at the left (tail) end of the queue and the newest one is added at the head (right). A running sum of the rewards is also maintained for computational efficiency.
Depending on your runtime, training can finish in less than a minute.
```py
%%time
max_episodes = 10000
max_steps_per_episode = 1000
# Cartpole-v0 is considered solved if average reward is >= 195 over 100
# consecutive trials
reward_threshold = 195
running_reward = 0
# Discount factor for future rewards
gamma = 0.99
with tqdm.trange(max_episodes) as t:
for i in t:
initial_state = tf.constant(env.reset(), dtype=tf.float32)
episode_reward = int(train_step(
initial_state, model, optimizer, gamma, max_steps_per_episode))
running_reward = episode_reward*0.01 + running_reward*.99
t.set_description(f'Episode {i}')
t.set_postfix(
episode_reward=episode_reward, running_reward=running_reward)
# Show average episode reward every 10 episodes
if i % 10 == 0:
pass # print(f'Episode {i}: average reward: {avg_reward}')
if running_reward > reward_threshold:
break
print(f'\nSolved at episode {i}: average reward: {running_reward:.2f}!')
```
```py
Episode 1524: 15%| | 1524/10000 [08:16<46:00, 3.07it/s, episode_reward=200, running_reward=195]
Solved at episode 1524: average reward: 195.03!
CPU times: user 20min 43s, sys: 4min 52s, total: 25min 35s
Wall time: 8min 16s
```
## Visualization
After training, it would be good to visualize how the model performs in the environment. You can run the cells below to generate a GIF animation of one episode run of the model. Note that additional packages need to be installed for OpenAI Gym to render the environment's images correctly in Colab.
```py
# Render an episode and save as a GIF file
from IPython import display as ipythondisplay
from PIL import Image
from pyvirtualdisplay import Display
display = Display(visible=0, size=(400, 300))
display.start()
def render_episode(env: gym.Env, model: tf.keras.Model, max_steps: int):
screen = env.render(mode='rgb_array')
im = Image.fromarray(screen)
images = [im]
state = tf.constant(env.reset(), dtype=tf.float32)
for i in range(1, max_steps + 1):
state = tf.expand_dims(state, 0)
action_probs, _ = model(state)
action = np.argmax(np.squeeze(action_probs))
state, _, done, _ = env.step(action)
state = tf.constant(state, dtype=tf.float32)
# Render screen every 10 steps
if i % 10 == 0:
screen = env.render(mode='rgb_array')
images.append(Image.fromarray(screen))
if done:
break
return images
# Save GIF image
images = render_episode(env, model, max_steps_per_episode)
image_file = 'cartpole-v0.gif'
# loop=0: loop forever, duration=1: play each frame for 1ms
images[0].save(
image_file, save_all=True, append_images=images[1:], loop=0, duration=1)
```
```py
import tensorflow_docs.vis.embed as embed
embed.embed_file(image_file)
```
![gif](img/536f812a8cb3bafa44a738899b173733.png)
## Next steps
This tutorial demonstrated how to implement the actor-critic method using Tensorflow.
As a next step, you could try training a model on a different environment in OpenAI Gym.
For additional information regarding actor-critic methods and the Cartpole-v0 problem, you may refer to the following resources:
* [Actor Critic Method](https://hal.inria.fr/hal-00840470/document)
* [Actor Critic Lecture (CAL)](https://www.youtube.com/watch?v=EKqxumCuAAY&list=PLkFD6_40KJIwhWJpGazJ9VSj9CFMkb79A&index=7&t=0s)
* [Cartpole learning control problem [Barto, et al. 1983]](http://www.derongliu.org/adp/adp-cdrom/Barto1983.pdf)
For more reinforcement learning examples in TensorFlow, you can check the following resources:
* [Reinforcement learning code examples (keras.io)](https://keras.io/examples/rl/)
* [TF-Agents reinforcement learning library](https://tensorflow.google.cn/agents)

View File

@@ -0,0 +1,138 @@
{
"metadata": {
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.0"
},
"orig_nbformat": 2,
"kernelspec": {
"name": "python380jvsc74a57bd038740d3277777e2cd7c6c2cc9d8addf5118fdf3f82b1b39231fd12aeac8aee8b",
"display_name": "Python 3.8.0 64-bit"
}
},
"nbformat": 4,
"nbformat_minor": 2,
"cells": [
{
"source": [
"## 说明\n",
"既然是采集就应该有菜鸡的觉悟。把tf.keras借口中的东西熟悉就好了。底层的东西不要碰。但愿federated模块也能通过tf.keras来实现。"
],
"cell_type": "markdown",
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"# 导入mnist手写体数据\n",
"import tensorflow as tf\n",
"mnist = tf.keras.datasets.mnist\n",
"\n",
"(x_train, y_train), (x_test, y_test) = mnist.load_data()\n",
"x_train, x_test = x_train / 255.0, x_test / 255.0\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"# 定义两层线性模型\n",
"model = tf.keras.models.Sequential([\n",
" tf.keras.layers.Flatten(input_shape=(28, 28)),\n",
" tf.keras.layers.Dense(128, activation='relu'),\n",
" tf.keras.layers.Dropout(0.2),\n",
" tf.keras.layers.Dense(10, activation='softmax')\n",
"])\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"# 使用sparse_categorical_crossentropy分类交叉熵损失函数。当分类为单个数据的时候使用这种方法\n",
"# 选择adam优化器。\n",
"model.compile(optimizer='adam',\n",
" loss='sparse_categorical_crossentropy',\n",
" metrics=['accuracy'])"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Epoch 1/5\n",
"1875/1875 [==============================] - 5s 790us/step - loss: 0.4760 - accuracy: 0.8624\n",
"Epoch 2/5\n",
"1875/1875 [==============================] - 2s 934us/step - loss: 0.1440 - accuracy: 0.9566\n",
"Epoch 3/5\n",
"1875/1875 [==============================] - 2s 914us/step - loss: 0.1083 - accuracy: 0.9683\n",
"Epoch 4/5\n",
"1875/1875 [==============================] - 2s 854us/step - loss: 0.0864 - accuracy: 0.9736\n",
"Epoch 5/5\n",
"1875/1875 [==============================] - 2s 1ms/step - loss: 0.0753 - accuracy: 0.9761\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<tensorflow.python.keras.callbacks.History at 0x193a340a130>"
]
},
"metadata": {},
"execution_count": 11
}
],
"source": [
"model.fit(x_train, y_train, epochs=5)\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"313/313 - 0s - loss: 0.0741 - accuracy: 0.9783\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"[0.07407715171575546, 0.9782999753952026]"
]
},
"metadata": {},
"execution_count": 12
}
],
"source": [
"model.evaluate(x_test, y_test, verbose=2)"
]
}
]
}

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,211 @@
{
"metadata": {
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.0"
},
"orig_nbformat": 2,
"kernelspec": {
"name": "python380jvsc74a57bd038740d3277777e2cd7c6c2cc9d8addf5118fdf3f82b1b39231fd12aeac8aee8b",
"display_name": "Python 3.8.0 64-bit"
}
},
"nbformat": 4,
"nbformat_minor": 2,
"cells": [
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# 引入TensorFlow\n",
"import tensorflow as tf\n",
"\n",
"from tensorflow.keras.layers import Dense, Flatten, Conv2D\n",
"from tensorflow.keras import Model"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# 1 导入数据集\n",
"mnist = tf.keras.datasets.mnist\n",
"\n",
"(x_train, y_train), (x_test, y_test) = mnist.load_data()\n",
"x_train, x_test = x_train / 255.0, x_test / 255.0\n",
"\n",
"# Add a channels dimension\n",
"x_train = x_train[..., tf.newaxis]\n",
"x_test = x_test[..., tf.newaxis]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# 2 数据处理-分成batch混淆数据\n",
"train_ds = tf.data.Dataset.from_tensor_slices(\n",
" (x_train, y_train)).shuffle(10000).batch(32)\n",
"test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(32)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"# 3.1使用面向对象的方式定义一个模型\n",
"# 对于多分类问题一般定义有多个分类的输出。例如10分类则结果有十个。并且可以将结果添加softmax层输出每一个的概率。然后使用交叉信息熵计算这十个输出的结果与目标结果的交叉损失。目标结果本质上应该是十个元素的向量onehot编码的向量。\n",
"class MyModel(Model):\n",
" def __init__(self):\n",
" super(MyModel, self).__init__()\n",
" self.conv1 = Conv2D(32, 3, activation='relu')\n",
" self.flatten = Flatten()\n",
" self.d1 = Dense(128, activation='relu')\n",
" self.d2 = Dense(10, activation='softmax')\n",
" # 3.2 进行正向传播===pytorch.forward \n",
" def call(self, x):\n",
" x = self.conv1(x)\n",
" x = self.flatten(x)\n",
" x = self.d1(x)\n",
" return self.d2(x)\n",
"\n",
"model = MyModel()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"# 3.3选择损失函数和3.5定义优化器\n",
"loss_object = tf.keras.losses.SparseCategoricalCrossentropy()\n",
"\n",
"optimizer = tf.keras.optimizers.Adam()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"# 4 选择评估指标\n",
"train_loss = tf.keras.metrics.Mean(name='train_loss')\n",
"train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')\n",
"\n",
"test_loss = tf.keras.metrics.Mean(name='test_loss')\n",
"test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='test_accuracy')"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"# 3 定义梯度下降的步骤\n",
"@tf.function\n",
"def train_step(images, labels):\n",
" with tf.GradientTape() as tape:\n",
" predictions = model(images)\n",
" loss = loss_object(labels, predictions)\n",
" gradients = tape.gradient(loss, model.trainable_variables)\n",
" optimizer.apply_gradients(zip(gradients, model.trainable_variables))\n",
"\n",
" train_loss(loss)\n",
" train_accuracy(labels, predictions)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"# 4 测试模型\n",
"@tf.function\n",
"def test_step(images, labels):\n",
" predictions = model(images)\n",
" t_loss = loss_object(labels, predictions)\n",
"\n",
" test_loss(t_loss)\n",
" test_accuracy(labels, predictions)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Epoch 1, Loss: 0.136604443192482, Accuracy: 95.85000610351562, Test Loss: 0.07532097399234772, Test Accuracy: 97.58999633789062\n",
"Epoch 2, Loss: 0.04355545714497566, Accuracy: 98.5999984741211, Test Loss: 0.04925746098160744, Test Accuracy: 98.32999420166016\n",
"Epoch 3, Loss: 0.023197738453745842, Accuracy: 99.24666595458984, Test Loss: 0.05581401661038399, Test Accuracy: 98.30999755859375\n",
"Epoch 4, Loss: 0.013380118645727634, Accuracy: 99.54000091552734, Test Loss: 0.057846762239933014, Test Accuracy: 98.45999908447266\n",
"Epoch 5, Loss: 0.010646246373653412, Accuracy: 99.625, Test Loss: 0.07796232402324677, Test Accuracy: 98.1199951171875\n"
]
}
],
"source": [
"EPOCHS = 5\n",
"\n",
"for epoch in range(EPOCHS):\n",
" # 在下一个epoch开始时重置评估指标\n",
" train_loss.reset_states()\n",
" train_accuracy.reset_states()\n",
" test_loss.reset_states()\n",
" test_accuracy.reset_states()\n",
"\n",
" for images, labels in train_ds:\n",
" train_step(images, labels)\n",
"\n",
" for test_images, test_labels in test_ds:\n",
" test_step(test_images, test_labels)\n",
"\n",
" template = 'Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, Test Accuracy: {}'\n",
" print (template.format(epoch+1,\n",
" train_loss.result(),\n",
" train_accuracy.result()*100,\n",
" test_loss.result(),\n",
" test_accuracy.result()*100))"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Model: \"my_model\"\n_________________________________________________________________\nLayer (type) Output Shape Param # \n=================================================================\nconv2d (Conv2D) multiple 320 \n_________________________________________________________________\nflatten (Flatten) multiple 0 \n_________________________________________________________________\ndense (Dense) multiple 2769024 \n_________________________________________________________________\ndense_1 (Dense) multiple 1290 \n=================================================================\nTotal params: 2,770,634\nTrainable params: 2,770,634\nNon-trainable params: 0\n_________________________________________________________________\nNone\n"
]
}
],
"source": [
"print(model.summary())"
]
}
]
}

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,358 @@
{
"metadata": {
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.0"
},
"orig_nbformat": 2,
"kernelspec": {
"name": "python380jvsc74a57bd038740d3277777e2cd7c6c2cc9d8addf5118fdf3f82b1b39231fd12aeac8aee8b",
"display_name": "Python 3.8.0 64-bit"
}
},
"nbformat": 4,
"nbformat_minor": 2,
"cells": [
{
"source": [
"# 数据流水线技术"
],
"cell_type": "markdown",
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import functools\n",
"\n",
"import numpy as np\n",
"import pandas as pd \n",
"import tensorflow as tf\n",
"import tensorflow_datasets as tfds"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"TRAIN_DATA_URL = \"https://storage.googleapis.com/tf-datasets/titanic/train.csv\"\n",
"TEST_DATA_URL = \"https://storage.googleapis.com/tf-datasets/titanic/eval.csv\"\n",
"\n",
"train_file_path = tf.keras.utils.get_file(\"train.csv\", TRAIN_DATA_URL)\n",
"test_file_path = tf.keras.utils.get_file(\"eval.csv\", TEST_DATA_URL)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" survived sex age n_siblings_spouses parch fare class deck \\\n",
"0 0 male 22.0 1 0 7.2500 Third unknown \n",
"1 1 female 38.0 1 0 71.2833 First C \n",
"2 1 female 26.0 0 0 7.9250 Third unknown \n",
"3 1 female 35.0 1 0 53.1000 First C \n",
"4 0 male 28.0 0 0 8.4583 Third unknown \n",
"\n",
" embark_town alone \n",
"0 Southampton n \n",
"1 Cherbourg n \n",
"2 Southampton y \n",
"3 Southampton n \n",
"4 Queenstown y "
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>survived</th>\n <th>sex</th>\n <th>age</th>\n <th>n_siblings_spouses</th>\n <th>parch</th>\n <th>fare</th>\n <th>class</th>\n <th>deck</th>\n <th>embark_town</th>\n <th>alone</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0</td>\n <td>male</td>\n <td>22.0</td>\n <td>1</td>\n <td>0</td>\n <td>7.2500</td>\n <td>Third</td>\n <td>unknown</td>\n <td>Southampton</td>\n <td>n</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1</td>\n <td>female</td>\n <td>38.0</td>\n <td>1</td>\n <td>0</td>\n <td>71.2833</td>\n <td>First</td>\n <td>C</td>\n <td>Cherbourg</td>\n <td>n</td>\n </tr>\n <tr>\n <th>2</th>\n <td>1</td>\n <td>female</td>\n <td>26.0</td>\n <td>0</td>\n <td>0</td>\n <td>7.9250</td>\n <td>Third</td>\n <td>unknown</td>\n <td>Southampton</td>\n <td>y</td>\n </tr>\n <tr>\n <th>3</th>\n <td>1</td>\n <td>female</td>\n <td>35.0</td>\n <td>1</td>\n <td>0</td>\n <td>53.1000</td>\n <td>First</td>\n <td>C</td>\n <td>Southampton</td>\n <td>n</td>\n </tr>\n <tr>\n <th>4</th>\n <td>0</td>\n <td>male</td>\n <td>28.0</td>\n <td>0</td>\n <td>0</td>\n <td>8.4583</td>\n <td>Third</td>\n <td>unknown</td>\n <td>Queenstown</td>\n <td>y</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 3
}
],
"source": [
"# 用来显示一下数据的内容。并非使用pandas进行数据读取。\n",
"df = pd.read_csv(train_file_path)\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# 1 从文件中直接读取csv数据并创建dataset\n",
"CSV_COLUMNS = ['survived', 'sex', 'age', 'n_siblings_spouses', 'parch', 'fare', 'class', 'deck', 'embark_town', 'alone']\n",
"\n",
"LABEL_COLUMN = 'survived'\n",
"LABELS = [0, 1]\n",
"\n",
"def get_dataset(file_path):\n",
" dataset = tf.data.experimental.make_csv_dataset(\n",
" file_path,\n",
" batch_size=12, # 为了示例更容易展示,手动设置较小的值\n",
" label_name=LABEL_COLUMN,\n",
" na_value=\"?\",\n",
" num_epochs=1,\n",
" ignore_errors=True)\n",
" return dataset\n",
"\n",
"raw_train_data = get_dataset(train_file_path)\n",
"raw_test_data = get_dataset(test_file_path)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"EXAMPLES: \n OrderedDict([('sex', <tf.Tensor: shape=(12,), dtype=string, numpy=\narray([b'male', b'female', b'female', b'male', b'male', b'male',\n b'female', b'male', b'female', b'male', b'male', b'female'],\n dtype=object)>), ('age', <tf.Tensor: shape=(12,), dtype=float32, numpy=\narray([34., 40., 28., 2., 56., 34., 38., 20., 28., 28., 37., 44.],\n dtype=float32)>), ('n_siblings_spouses', <tf.Tensor: shape=(12,), dtype=int32, numpy=array([0, 1, 0, 4, 0, 1, 1, 0, 0, 0, 0, 0])>), ('parch', <tf.Tensor: shape=(12,), dtype=int32, numpy=array([0, 1, 0, 1, 0, 0, 5, 0, 0, 0, 1, 0])>), ('fare', <tf.Tensor: shape=(12,), dtype=float32, numpy=\narray([ 6.4958, 39. , 79.2 , 39.6875, 26.55 , 26. , 31.3875,\n 7.8542, 7.75 , 39.6 , 29.7 , 27.7208], dtype=float32)>), ('class', <tf.Tensor: shape=(12,), dtype=string, numpy=\narray([b'Third', b'Second', b'First', b'Third', b'First', b'Second',\n b'Third', b'Third', b'Third', b'First', b'First', b'First'],\n dtype=object)>), ('deck', <tf.Tensor: shape=(12,), dtype=string, numpy=\narray([b'unknown', b'unknown', b'unknown', b'unknown', b'unknown',\n b'unknown', b'unknown', b'unknown', b'unknown', b'unknown', b'C',\n b'B'], dtype=object)>), ('embark_town', <tf.Tensor: shape=(12,), dtype=string, numpy=\narray([b'Southampton', b'Southampton', b'Cherbourg', b'Southampton',\n b'Southampton', b'Southampton', b'Southampton', b'Southampton',\n b'Queenstown', b'Cherbourg', b'Cherbourg', b'Cherbourg'],\n dtype=object)>), ('alone', <tf.Tensor: shape=(12,), dtype=string, numpy=\narray([b'y', b'n', b'y', b'n', b'y', b'n', b'n', b'y', b'y', b'y', b'n',\n b'y'], dtype=object)>)]) \n\nLABELS: \n tf.Tensor([0 1 1 0 0 0 1 0 1 0 0 1], shape=(12,), dtype=int32)\n"
]
}
],
"source": [
"# 1 加载数据,并查看已经加载的第一个数据。\n",
"examples, labels = next(iter(raw_train_data)) # 第一个批次\n",
"print(\"EXAMPLES: \\n\", examples, \"\\n\")\n",
"print(\"LABELS: \\n\", labels)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# 2 对数据预处理——参考结构化数据处理的部分。文本类数据转换成数字\n",
"CATEGORIES = {\n",
" 'sex': ['male', 'female'],\n",
" 'class' : ['First', 'Second', 'Third'],\n",
" 'deck' : ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'],\n",
" 'embark_town' : ['Cherbourg', 'Southhampton', 'Queenstown'],\n",
" 'alone' : ['y', 'n']\n",
"}\n",
"\n",
"categorical_columns = []\n",
"for feature, vocab in CATEGORIES.items():\n",
" cat_col = tf.feature_column.categorical_column_with_vocabulary_list(\n",
" key=feature, vocabulary_list=vocab)\n",
" categorical_columns.append(tf.feature_column.indicator_column(cat_col))"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"# 2 数据预处理——特征数据,数值数据标准化\n",
"def process_continuous_data(mean, data):\n",
" # 标准化数据\n",
" data = tf.cast(data, tf.float32) * 1/(2*mean)\n",
" return tf.reshape(data, [-1, 1])\n",
"\n",
"\n",
"MEANS = {\n",
" 'age' : 29.631308,\n",
" 'n_siblings_spouses' : 0.545455,\n",
" 'parch' : 0.379585,\n",
" 'fare' : 34.385399\n",
"}\n",
"\n",
"numerical_columns = []\n",
"\n",
"for feature in MEANS.keys():\n",
" num_col = tf.feature_column.numeric_column(feature, normalizer_fn=functools.partial(process_continuous_data, MEANS[feature]))\n",
" numerical_columns.append(num_col)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"# 2 数据处理——创建数据预处理层\n",
"preprocessing_layer = tf.keras.layers.DenseFeatures(categorical_columns+numerical_columns)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"# 3 构建模型\n",
"model = tf.keras.Sequential([\n",
" preprocessing_layer,\n",
" tf.keras.layers.Dense(128, activation='relu'),\n",
" tf.keras.layers.Dense(128, activation='relu'),\n",
" tf.keras.layers.Dense(1, activation='sigmoid'),\n",
"])\n",
"\n",
"model.compile(\n",
" loss='binary_crossentropy',\n",
" optimizer='adam',\n",
" metrics=['accuracy'])\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Epoch 1/20\n",
"WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'collections.OrderedDict'> input: OrderedDict([('sex', <tf.Tensor 'ExpandDims_8:0' shape=(None, 1) dtype=string>), ('age', <tf.Tensor 'ExpandDims:0' shape=(None, 1) dtype=float32>), ('n_siblings_spouses', <tf.Tensor 'ExpandDims_6:0' shape=(None, 1) dtype=int32>), ('parch', <tf.Tensor 'ExpandDims_7:0' shape=(None, 1) dtype=int32>), ('fare', <tf.Tensor 'ExpandDims_5:0' shape=(None, 1) dtype=float32>), ('class', <tf.Tensor 'ExpandDims_2:0' shape=(None, 1) dtype=string>), ('deck', <tf.Tensor 'ExpandDims_3:0' shape=(None, 1) dtype=string>), ('embark_town', <tf.Tensor 'ExpandDims_4:0' shape=(None, 1) dtype=string>), ('alone', <tf.Tensor 'ExpandDims_1:0' shape=(None, 1) dtype=string>)])\n",
"Consider rewriting this model with the Functional API.\n",
"WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'collections.OrderedDict'> input: OrderedDict([('sex', <tf.Tensor 'ExpandDims_8:0' shape=(None, 1) dtype=string>), ('age', <tf.Tensor 'ExpandDims:0' shape=(None, 1) dtype=float32>), ('n_siblings_spouses', <tf.Tensor 'ExpandDims_6:0' shape=(None, 1) dtype=int32>), ('parch', <tf.Tensor 'ExpandDims_7:0' shape=(None, 1) dtype=int32>), ('fare', <tf.Tensor 'ExpandDims_5:0' shape=(None, 1) dtype=float32>), ('class', <tf.Tensor 'ExpandDims_2:0' shape=(None, 1) dtype=string>), ('deck', <tf.Tensor 'ExpandDims_3:0' shape=(None, 1) dtype=string>), ('embark_town', <tf.Tensor 'ExpandDims_4:0' shape=(None, 1) dtype=string>), ('alone', <tf.Tensor 'ExpandDims_1:0' shape=(None, 1) dtype=string>)])\n",
"Consider rewriting this model with the Functional API.\n",
"53/53 [==============================] - 1s 2ms/step - loss: 0.5951 - accuracy: 0.6944\n",
"Epoch 2/20\n",
"53/53 [==============================] - 0s 1ms/step - loss: 0.4517 - accuracy: 0.7968\n",
"Epoch 3/20\n",
"53/53 [==============================] - 0s 1ms/step - loss: 0.4124 - accuracy: 0.8221\n",
"Epoch 4/20\n",
"53/53 [==============================] - 0s 2ms/step - loss: 0.3807 - accuracy: 0.8331\n",
"Epoch 5/20\n",
"53/53 [==============================] - 0s 1ms/step - loss: 0.4360 - accuracy: 0.8041\n",
"Epoch 6/20\n",
"53/53 [==============================] - 0s 1ms/step - loss: 0.3664 - accuracy: 0.8444\n",
"Epoch 7/20\n",
"53/53 [==============================] - 0s 1ms/step - loss: 0.3733 - accuracy: 0.8437\n",
"Epoch 8/20\n",
"53/53 [==============================] - 0s 1ms/step - loss: 0.3783 - accuracy: 0.8281\n",
"Epoch 9/20\n",
"53/53 [==============================] - 0s 1ms/step - loss: 0.4091 - accuracy: 0.8267\n",
"Epoch 10/20\n",
"53/53 [==============================] - 0s 997us/step - loss: 0.3433 - accuracy: 0.8412\n",
"Epoch 11/20\n",
"53/53 [==============================] - 0s 863us/step - loss: 0.3637 - accuracy: 0.8352\n",
"Epoch 12/20\n",
"53/53 [==============================] - 0s 959us/step - loss: 0.3832 - accuracy: 0.8338\n",
"Epoch 13/20\n",
"53/53 [==============================] - 0s 921us/step - loss: 0.3209 - accuracy: 0.8688\n",
"Epoch 14/20\n",
"53/53 [==============================] - 0s 1ms/step - loss: 0.3284 - accuracy: 0.8708\n",
"Epoch 15/20\n",
"53/53 [==============================] - 0s 1ms/step - loss: 0.3374 - accuracy: 0.8555\n",
"Epoch 16/20\n",
"53/53 [==============================] - 0s 997us/step - loss: 0.3747 - accuracy: 0.8372\n",
"Epoch 17/20\n",
"53/53 [==============================] - 0s 1ms/step - loss: 0.3610 - accuracy: 0.8367\n",
"Epoch 18/20\n",
"53/53 [==============================] - 0s 1ms/step - loss: 0.3312 - accuracy: 0.8789\n",
"Epoch 19/20\n",
"53/53 [==============================] - 0s 940us/step - loss: 0.3279 - accuracy: 0.8509\n",
"Epoch 20/20\n",
"53/53 [==============================] - 0s 921us/step - loss: 0.3229 - accuracy: 0.8603\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<tensorflow.python.keras.callbacks.History at 0x20fb36db8e0>"
]
},
"metadata": {},
"execution_count": 10
}
],
"source": [
"train_data = raw_train_data.shuffle(500)\n",
"test_data = raw_test_data\n",
"model.fit(train_data, epochs=20)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'collections.OrderedDict'> input: OrderedDict([('sex', <tf.Tensor 'ExpandDims_8:0' shape=(None, 1) dtype=string>), ('age', <tf.Tensor 'ExpandDims:0' shape=(None, 1) dtype=float32>), ('n_siblings_spouses', <tf.Tensor 'ExpandDims_6:0' shape=(None, 1) dtype=int32>), ('parch', <tf.Tensor 'ExpandDims_7:0' shape=(None, 1) dtype=int32>), ('fare', <tf.Tensor 'ExpandDims_5:0' shape=(None, 1) dtype=float32>), ('class', <tf.Tensor 'ExpandDims_2:0' shape=(None, 1) dtype=string>), ('deck', <tf.Tensor 'ExpandDims_3:0' shape=(None, 1) dtype=string>), ('embark_town', <tf.Tensor 'ExpandDims_4:0' shape=(None, 1) dtype=string>), ('alone', <tf.Tensor 'ExpandDims_1:0' shape=(None, 1) dtype=string>)])\n",
"Consider rewriting this model with the Functional API.\n",
"22/22 [==============================] - 0s 1ms/step - loss: 0.4477 - accuracy: 0.8182\n",
"\n",
"\n",
"Test Loss 0.4477050304412842, Test Accuracy 0.8181818127632141\n"
]
}
],
"source": [
"# 4 验证模型\n",
"test_loss, test_accuracy = model.evaluate(test_data)\n",
"\n",
"print('\\n\\nTest Loss {}, Test Accuracy {}'.format(test_loss, test_accuracy))"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"WARNING:tensorflow:Layers in a Sequential model should only have a single input tensor, but we receive a <class 'collections.OrderedDict'> input: OrderedDict([('sex', <tf.Tensor 'ExpandDims_8:0' shape=(None, 1) dtype=string>), ('age', <tf.Tensor 'ExpandDims:0' shape=(None, 1) dtype=float32>), ('n_siblings_spouses', <tf.Tensor 'ExpandDims_6:0' shape=(None, 1) dtype=int32>), ('parch', <tf.Tensor 'ExpandDims_7:0' shape=(None, 1) dtype=int32>), ('fare', <tf.Tensor 'ExpandDims_5:0' shape=(None, 1) dtype=float32>), ('class', <tf.Tensor 'ExpandDims_2:0' shape=(None, 1) dtype=string>), ('deck', <tf.Tensor 'ExpandDims_3:0' shape=(None, 1) dtype=string>), ('embark_town', <tf.Tensor 'ExpandDims_4:0' shape=(None, 1) dtype=string>), ('alone', <tf.Tensor 'ExpandDims_1:0' shape=(None, 1) dtype=string>)])\n",
"Consider rewriting this model with the Functional API.\n",
"Predicted survival: 12.24% | Actual outcome: SURVIVED\n",
"Predicted survival: 10.38% | Actual outcome: SURVIVED\n",
"Predicted survival: 13.67% | Actual outcome: DIED\n",
"Predicted survival: 64.86% | Actual outcome: SURVIVED\n",
"Predicted survival: 4.28% | Actual outcome: DIED\n",
"Predicted survival: 27.77% | Actual outcome: DIED\n",
"Predicted survival: 13.21% | Actual outcome: DIED\n",
"Predicted survival: 8.61% | Actual outcome: SURVIVED\n",
"Predicted survival: 9.82% | Actual outcome: SURVIVED\n",
"Predicted survival: 60.03% | Actual outcome: SURVIVED\n"
]
}
],
"source": [
"predictions = model.predict(test_data)\n",
"\n",
"# 显示部分结果\n",
"for prediction, survived in zip(predictions[:10], list(test_data)[0][1][:10]):\n",
" print(\"Predicted survival: {:.2%}\".format(prediction[0]),\n",
" \" | Actual outcome: \",\n",
" (\"SURVIVED\" if bool(survived) else \"DIED\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
]
}

View File

@@ -0,0 +1,85 @@
{
"metadata": {
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": 3
},
"orig_nbformat": 2
},
"nbformat": 4,
"nbformat_minor": 2,
"cells": [
{
"source": [
"# tf.data.dataset常用的API\n",
"\n",
"tf.data API 围绕可组合转换而设计,旨在为用户提供灵活性。虽然这些转换中有很多都是可以交替的,但某些转换的顺序会对性能产生影响。\n",
"\n",
"## 1 map映射和batch批次\n",
"调用传递给 map 转换的用户定义函数具有与调度和执行用户定义函数相关的开销。通常,与函数执行的计算量相比,这种开销很小。但是,如果 map 几乎不起作用,那么这种开销可能会占总成本的很大一部分。在这种情况下,建议向量化用户定义的函数(即,让该函数一次对一批输入进行操作),并在 map 转换之前先应用 batch 转换。\n",
"\n",
"## 2 map映射和cache缓存\n",
"\n",
"tf.data.Dataset.cache 转换可以在内存或本地存储中缓存数据集。如果传递给 map 转换的用户定义函数代价很高,则只要内存或本地存储仍可以容纳生成的数据集,就可以在映射转换后应用缓存转换。如果用户定义的函数会增加存储数据集所需的空间,并超出缓存容量,请考虑在训练作业之前预处理数据以减少资源消耗量。\n",
"\n",
"## 3 map映射和interleave交错/prefetch预取/shuffle重排\n",
"许多转换包括map interleave、prefetch 和 shuffle都维持一个内部元素缓冲区。如果传递给 map 转换的用户定义函数改变了元素的大小,那么映射转换的顺序和缓冲元素的转换会影响内存使用量。通常,我们建议选择可以减少内存占用的顺序,除非为了提高性能而需要采用不同的顺序(例如,为了混合映射和批次转换)。\n",
"\n",
"## 4 repeat重复和shuffle重排\n",
"tf.data.Dataset.repeat 转换会将输入数据重复有限或无限每次数据重复通常称为一个周期。tf.data.Dataset.shuffle 转换会随机化数据集样本的顺序。\n",
"\n",
"如果在 shuffle 转换之前应用 repeat 转换,则系统会对周期边界进行模糊处理。也就是说,某些元素可以在其他元素出现之前重复出现。另一方面,如果在重复转换之前应用 shuffle 转换,那么在每个周期开始时性能可能会降低,因为需要初始化 shuffle 转换的内部状态。换言之前者repeat 在 shuffle 之前可提供更好的性能而后者repeat 在 shuffle 之前)可提供更强的排序保证。\n",
"\n",
"如果可能,建议您使用 tf.contrib.data.shuffle_and_repeat 混合转换这样可以达到两全其美的效果良好的性能和强大的排序保证。否则我们建议在repeat重复之前进行shuffle重排。"
],
"cell_type": "markdown",
"metadata": {}
},
{
"source": [
"# Pipeline Structure的结构\n",
"\n",
"我们可以将典型的 TensorFlow 训练输入流水线视为 ETL 流程:\n",
"\n",
"1. Extract:从永久性存储(可以是 HDD 或 SSD 等本地存储或 GCS 或 HDFS 等远程存储)读取数据。\n",
"2. Transform:使用CPU核心解析数据并对其执行预处理操作例如图像解压缩、数据增强转换例如随机裁剪、翻转和颜色失真、重排和批处理。\n",
"3. Load:将转换后的数据加载到执行机器学习模型的加速器设备例如GPU 或 TPU上。"
],
"cell_type": "markdown",
"metadata": {}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# tf.data说明\n",
"\n",
"## 加载数据的方式\n",
"### 从内存中加载数据\n",
"例如使用numpy.load()或者pandas.read_csv()将数据加载到内存中。然后使用tf.data.dataset方法将数据加载到tensorflow中。_\n",
"tf.data.Dataset.from_tensors() or tf.data.Dataset.from_tensor_slices()\n",
"### 从文件中加载数据\n",
"tf.data.TFRecordDataset()\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 数据流水线读取多个文件实例(并非广义上的数据流水线)\n"
]
}
]
}

View File

@@ -0,0 +1,242 @@
{
"metadata": {
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.0"
},
"orig_nbformat": 2,
"kernelspec": {
"name": "python380jvsc74a57bd038740d3277777e2cd7c6c2cc9d8addf5118fdf3f82b1b39231fd12aeac8aee8b",
"display_name": "Python 3.8.0 64-bit"
}
},
"nbformat": 4,
"nbformat_minor": 2,
"cells": [
{
"source": [
"# 结构化数据分类"
],
"cell_type": "markdown",
"metadata": {}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"import tensorflow as tf\n",
"\n",
"from tensorflow import feature_column\n",
"from tensorflow.keras import layers\n",
"from sklearn.model_selection import train_test_split"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 1 加载数据\n",
"dataframe = pd.read_csv(\"simple.csv\")\n",
"dataframe.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 2 对数据进行预处理将数据分割成多分6:2:2的三分数据集。没想到它也是用sklearn对数据集进行划分。\n",
"train, test = train_test_split(dataframe, test_size=0.2)\n",
"train, val = train_test_split(train, test_size=0.2)\n",
"print(len(train), 'train examples')\n",
"print(len(val), 'validation examples')\n",
"print(len(test), 'test examples')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 3 创建数据流水线。将数据转换成TensorFlow中的tensor\n",
"# 一种从 Pandas Dataframe 创建 tf.data 数据集的实用程序方法utility method。我们将使用 tf.data 包装 dataframe。这让我们能将特征列作为一座桥梁该桥梁将 Pandas dataframe 中的列映射到用于训练模型的特征。如果我们使用一个非常大的 CSV 文件(非常大以至于它不能放入内存),我们将使用 tf.data 直接从磁盘读取它。\n",
"def df_to_dataset(dataframe, shuffle=True, batch_size=32):\n",
" dataframe = dataframe.copy()\n",
" labels = dataframe.pop('Class')\n",
"# print(pd.value_counts(labels))\n",
" ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))\n",
" if shuffle:\n",
" ds = ds.shuffle(buffer_size=len(dataframe))\n",
" ds = ds.batch(batch_size)\n",
" return ds"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"batch_size = 60 # 小批量大小用于演示\n",
"train_ds = df_to_dataset(train, batch_size=batch_size)\n",
"val_ds = df_to_dataset(val, shuffle=False, batch_size=batch_size)\n",
"test_ds = df_to_dataset(test, shuffle=False, batch_size=batch_size)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 理解输入流水线中的数据\n",
"for feature_batch, label_batch in train_ds.take(1):\n",
" print('Every feature:', list(feature_batch.keys()))\n",
" print('A batch of rename:', feature_batch['rename'])\n",
" print('A batch of targets:', label_batch )"
]
},
{
"source": [
"## 几种特征的处理\n",
"### 数值特征\n",
"* 直接转移。\n",
"```\n",
"age = feature_column.numeric_column(\"age\")\n",
"```\n",
"### 分桶列。\n",
"* 考虑代表一个人年龄的原始数据。我们可以用 分桶列bucketized column将年龄分成几个分桶buckets而不是将年龄表示成数值列。\n",
"```\n",
"age_buckets = feature_column.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])\n",
"```\n",
"### 分类列\n",
"* thal 用字符串表示(如 'fixed''normal',或 'reversible'。我们无法直接将字符串提供给模型。相反我们必须首先将它们映射到数值。分类词汇列categorical vocabulary columns提供了一种用 one-hot 向量表示字符串的方法(就像您在上面看到的年龄分桶一样)。词汇表可以用 categorical_column_with_vocabulary_list 作为 list 传递,或者用 categorical_column_with_vocabulary_file 从文件中加载。\n",
"```\n",
"thal = feature_column.categorical_column_with_vocabulary_list(\n",
" 'thal', ['fixed', 'normal', 'reversible'])\n",
"\n",
"thal_one_hot = feature_column.indicator_column(thal)\n",
"```\n",
"### 嵌入列\n",
"* 假设我们不是只有几个可能的字符串,而是每个类别有数千(或更多)值。 由于多种原因,随着类别数量的增加,使用 one-hot 编码训练神经网络变得不可行。我们可以使用嵌入列来克服此限制。嵌入列embedding column将数据表示为一个低维度密集向量而非多维的 one-hot 向量,该低维度密集向量可以包含任何数,而不仅仅是 0 或 1。嵌入的大小在下面的示例中为 8是必须调整的参数。\n",
"* 关键点:当分类列具有许多可能的值时,最好使用嵌入列。我们在这里使用嵌入列用于演示目的,为此您有一个完整的示例,以在将来可以修改用于其他数据集。\n",
"```\n",
"# 注意到嵌入列的输入是我们之前创建的类别列\n",
"thal_embedding = feature_column.embedding_column(thal, dimension=8)\n",
"```\n",
"\n",
"### 哈希处理特征列\n",
"* 表示具有大量数值的分类列的另一种方法是使用 categorical_column_with_hash_bucket。该特征列计算输入的一个哈希值然后选择一个 hash_bucket_size 分桶来编码字符串。使用此列时,您不需要提供词汇表,并且可以选择使 hash_buckets 的数量远远小于实际类别的数量以节省空间。\n",
"* 关键点:该技术的一个重要缺点是可能存在冲突,不同的字符串被映射到同一个范围。实际上,无论如何,经过哈希处理的特征列对某些数据集都有效。\n",
"```\n",
"thal_hashed = feature_column.categorical_column_with_hash_bucket(\n",
" 'thal', hash_bucket_size=1000)\n",
"```\n",
"### 组合的特征列\n",
"* 将多种特征组合到一个特征中称为特征组合feature crosses它让模型能够为每种特征组合学习单独的权重。此处我们将创建一个 age 和 thal 组合的新特征。请注意crossed_column 不会构建所有可能组合的完整列表(可能非常大)。相反,它由 hashed_column 支持,因此您可以选择表的大小。\n",
"```\n",
"crossed_feature = feature_column.crossed_column([age_buckets, thal], hash_bucket_size=1000)\n",
"demo(feature_column.indicator_column(crossed_feature))\n",
"```"
],
"cell_type": "markdown",
"metadata": {}
},
{
"source": [
"## 选择特征列\n",
"```\n",
"feature_columns = []\n",
"\n",
"# 数值列\n",
"for header in ['age', 'trestbps', 'chol', 'thalach', 'oldpeak', 'slope', 'ca']:\n",
" feature_columns.append(feature_column.numeric_column(header))\n",
"\n",
"# 分桶列\n",
"age_buckets = feature_column.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])\n",
"feature_columns.append(age_buckets)\n",
"\n",
"# 分类列\n",
"thal = feature_column.categorical_column_with_vocabulary_list(\n",
" 'thal', ['fixed', 'normal', 'reversible'])\n",
"thal_one_hot = feature_column.indicator_column(thal)\n",
"feature_columns.append(thal_one_hot)\n",
"\n",
"# 嵌入列\n",
"thal_embedding = feature_column.embedding_column(thal, dimension=8)\n",
"feature_columns.append(thal_embedding)\n",
"\n",
"# 组合列\n",
"crossed_feature = feature_column.crossed_column([age_buckets, thal], hash_bucket_size=1000)\n",
"crossed_feature = feature_column.indicator_column(crossed_feature)\n",
"feature_columns.append(crossed_feature)\n",
"```"
],
"cell_type": "markdown",
"metadata": {}
},
{
"source": [
"## 建立特征层\n",
"现在我们已经定义了我们的特征列feature_column我们将使用密集特征DenseFeatures层将特征列输入到我们的 Keras 模型中。\n",
"```\n",
"feature_layer = tf.keras.layers.DenseFeatures(feature_columns)\n",
"```"
],
"cell_type": "markdown",
"metadata": {}
},
{
"source": [
"## 使用特征层\n",
"```\n",
"model = tf.keras.Sequential([\n",
" feature_layer,\n",
" layers.Dense(128, activation='relu'),\n",
" layers.Dense(128, activation='relu'),\n",
" layers.Dense(1, activation='sigmoid')\n",
"])\n",
"\n",
"model.compile(optimizer='adam',\n",
" loss='binary_crossentropy',\n",
" metrics=['accuracy'],\n",
" run_eagerly=True)\n",
"\n",
"model.fit(train_ds,\n",
" validation_data=val_ds,\n",
" epochs=5)\n",
"```"
],
"cell_type": "markdown",
"metadata": {}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 评估模型\n",
"loss, accuracy = model.evaluate(test_ds)\n",
"print(\"Accuracy\", accuracy)"
]
}
]
}

View File

@@ -1,3 +0,0 @@
# TensorFlow 2.4 官方教程
来源:[https://tensorflow.google.cn/tutorials?authuser=0](https://tensorflow.google.cn/tutorials?authuser=0)

View File

@@ -1,71 +0,0 @@
+ [TensorFlow 2.4 官方教程](README.md)
+ [初学者的 TensorFlow 2.0 教程](002.md)
+ [针对专业人员的 TensorFlow 2.0 入门](003.md)
+ [初级](004.md)
+ [Keras 机器学习基础知识](005.md)
+ [基本分类:对服装图像进行分类](006.md)
+ [电影评论文本分类](007.md)
+ [使用 Keras 和 Tensorflow Hub 对电影评论进行文本分类](008.md)
+ [Basic regression: Predict fuel efficiency](009.md)
+ [Overfit and underfit](010.md)
+ [保存和恢复模型](011.md)
+ [Introduction to the Keras Tuner](012.md)
+ [加载和预处理数据](013.md)
+ [用 tf.data 加载图片](014.md)
+ [使用 tf.data 加载文本数据](015.md)
+ [用 tf.data 加载 CSV 数据](016.md)
+ [使用 tf.data 加载 NumPy 数据](017.md)
+ [使用 tf.data 加载 pandas dataframes](018.md)
+ [Unicode 字符串](019.md)
+ [TF.Text](020.md)
+ [TFRecord 和 tf.Example](021.md)
+ [Estimator](022.md)
+ [预创建的 Estimators](023.md)
+ [Build a linear model with Estimators](024.md)
+ [在 Tensorflow 中训练提升树Boosted Trees模型](025.md)
+ [梯度提升树Gradient Boosted Trees模型理解](026.md)
+ [通过 Keras 模型创建 Estimator](027.md)
+ [高级](028.md)
+ [自定义](029.md)
+ [Customization basics: tensors and operations](030.md)
+ [Custom layers](031.md)
+ [自定义训练: 演示](032.md)
+ [分布式训练](033.md)
+ [Keras 的分布式训练](034.md)
+ [使用 tf.distribute.Strategy 进行自定义训练](035.md)
+ [利用 Keras 来训练多工作器worker](036.md)
+ [利用 Estimator 进行多工作器训练](037.md)
+ [使用分布策略保存和加载模型](038.md)
+ [Distributed Input](039.md)
+ [图像](040.md)
+ [卷积神经网络Convolutional Neural Network, CNN](041.md)
+ [Image classification](042.md)
+ [Transfer learning and fine-tuning](043.md)
+ [Transfer learning with TensorFlow Hub](044.md)
+ [Data augmentation](045.md)
+ [图像分割](046.md)
+ [文本](047.md)
+ [单词嵌入向量](048.md)
+ [使用 RNN 进行文本分类](049.md)
+ [循环神经网络RNN文本生成](050.md)
+ [基于注意力的神经机器翻译](051.md)
+ [Image captioning with visual attention](052.md)
+ [理解语言的 Transformer 模型](053.md)
+ [Fine-tuning a BERT model](054.md)
+ [结构化数据](055.md)
+ [对结构化数据进行分类](056.md)
+ [Classification on imbalanced data](057.md)
+ [Time series forecasting](058.md)
+ [生成](059.md)
+ [神经风格迁移](060.md)
+ [DeepDream](061.md)
+ [深度卷积生成对抗网络](062.md)
+ [Pix2Pix](063.md)
+ [CycleGAN](064.md)
+ [Adversarial example using FGSM](065.md)
+ [Intro to Autoencoders](066.md)
+ [卷积变分自编码器](067.md)
+ [可解释性](068.md)
+ [Integrated gradients](069.md)
+ [强化学习](070.md)
+ [Playing CartPole with the Actor-Critic Method](071.md)

Binary file not shown.

Before

Width:  |  Height:  |  Size: 861 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 802 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.0 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.2 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 46 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 38 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 612 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 8.1 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.4 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.2 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 5.3 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 5.0 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 35 KiB

Some files were not shown because too many files have changed in this diff Show More