付録: バッチトレーニング

データセットが非常に大きいと、プロセスに割り当てられたメモリに収まらない場合があります。前のステップでは、データセット全体を取り込み、データを準備してワーキングセットをトレーニング使用します。Keras では代わりにトレーニング関数が用意されており、 (fit_generator) データをバッチで pull する ML モデルです。これにより、変換をデータパイプラインをデータのごく一部（batch_size の倍数）のみに適用する。テストでは、次のようなデータセットにバッチ処理（GitHub のコード）を使用しました。 DBPedia、Amazon のレビュー、Ag news、Yelp のレビュー

次のコードは、データのバッチを生成し、 fit_generator。

def _data_generator(x, y, num_features, batch_size):
    """Generates batches of vectorized texts for training/validation.

    # Arguments
        x: np.matrix, feature matrix.
        y: np.ndarray, labels.
        num_features: int, number of features.
        batch_size: int, number of samples per batch.

    # Returns
        Yields feature and label data in batches.
    """
    num_samples = x.shape[0]
    num_batches = num_samples // batch_size
    if num_samples % batch_size:
        num_batches += 1

    while 1:
        for i in range(num_batches):
            start_idx = i * batch_size
            end_idx = (i + 1) * batch_size
            if end_idx > num_samples:
                end_idx = num_samples
            x_batch = x[start_idx:end_idx]
            y_batch = y[start_idx:end_idx]
            yield x_batch, y_batch

# Create training and validation generators.
training_generator = _data_generator(
    x_train, train_labels, num_features, batch_size)
validation_generator = _data_generator(
    x_val, val_labels, num_features, batch_size)

# Get number of training steps. This indicated the number of steps it takes
# to cover all samples in one epoch.
steps_per_epoch = x_train.shape[0] // batch_size
if x_train.shape[0] % batch_size:
    steps_per_epoch += 1

# Get number of validation steps.
validation_steps = x_val.shape[0] // batch_size
if x_val.shape[0] % batch_size:
    validation_steps += 1

# Train and validate model.
history = model.fit_generator(
    generator=training_generator,
    steps_per_epoch=steps_per_epoch,
    validation_data=validation_generator,
    validation_steps=validation_steps,
    callbacks=callbacks,
    epochs=epochs,
    verbose=2)  # Logs once per epoch.

まとめ

付録: バッチ トレーニング コレクションでコンテンツを整理 必要に応じて、コンテンツの保存と分類を行います。

付録: バッチトレーニング