الملحق: تدريب مجمّع
تنظيم صفحاتك في مجموعات
يمكنك حفظ المحتوى وتصنيفه حسب إعداداتك المفضّلة.
قد لا تتناسب مجموعات البيانات الكبيرة جدًا مع الذاكرة المخصصة لعمليتك. في جلسة المعمل،
الخطوات السابقة، قمنا بإعداد مسار ونجلب فيه مجموعة البيانات كاملةً
إلى الذاكرة، وإعداد البيانات، وتمرير مجموعة العمل إلى
الأخرى. بدلاً من ذلك، توفر Keras دالة تدريب بديلة
(fit_generator
)
يسحب البيانات على دفعات. وهذا يتيح لنا تطبيق التحولات في
مسار البيانات إلى جزء صغير فقط (مضاعف batch_size
) من البيانات.
أثناء تجاربنا، استخدمنا التجميع (التعليمات البرمجية في GitHub) لمجموعات البيانات مثل
DBPedia ومراجعات Amazon وأخبار Ag وتعليقات Yelp
يوضح الرمز التالي كيفية إنشاء دفعات البيانات وتزويدها
fit_generator
def _data_generator(x, y, num_features, batch_size):
"""Generates batches of vectorized texts for training/validation.
# Arguments
x: np.matrix, feature matrix.
y: np.ndarray, labels.
num_features: int, number of features.
batch_size: int, number of samples per batch.
# Returns
Yields feature and label data in batches.
"""
num_samples = x.shape[0]
num_batches = num_samples // batch_size
if num_samples % batch_size:
num_batches += 1
while 1:
for i in range(num_batches):
start_idx = i * batch_size
end_idx = (i + 1) * batch_size
if end_idx > num_samples:
end_idx = num_samples
x_batch = x[start_idx:end_idx]
y_batch = y[start_idx:end_idx]
yield x_batch, y_batch
# Create training and validation generators.
training_generator = _data_generator(
x_train, train_labels, num_features, batch_size)
validation_generator = _data_generator(
x_val, val_labels, num_features, batch_size)
# Get number of training steps. This indicated the number of steps it takes
# to cover all samples in one epoch.
steps_per_epoch = x_train.shape[0] // batch_size
if x_train.shape[0] % batch_size:
steps_per_epoch += 1
# Get number of validation steps.
validation_steps = x_val.shape[0] // batch_size
if x_val.shape[0] % batch_size:
validation_steps += 1
# Train and validate model.
history = model.fit_generator(
generator=training_generator,
steps_per_epoch=steps_per_epoch,
validation_data=validation_generator,
validation_steps=validation_steps,
callbacks=callbacks,
epochs=epochs,
verbose=2) # Logs once per epoch.
إنّ محتوى هذه الصفحة مرخّص بموجب ترخيص Creative Commons Attribution 4.0 ما لم يُنصّ على خلاف ذلك، ونماذج الرموز مرخّصة بموجب ترخيص Apache 2.0. للاطّلاع على التفاصيل، يُرجى مراجعة سياسات موقع Google Developers. إنّ Java هي علامة تجارية مسجَّلة لشركة Oracle و/أو شركائها التابعين.
تاريخ التعديل الأخير: 2025-07-27 (حسب التوقيت العالمي المتفَّق عليه)
[null,null,["تاريخ التعديل الأخير: 2025-07-27 (حسب التوقيت العالمي المتفَّق عليه)"],[[["\u003cp\u003eKeras' \u003ccode\u003efit_generator\u003c/code\u003e function enables training on very large datasets that exceed memory capacity by processing data in batches.\u003c/p\u003e\n"],["\u003cp\u003eBatching applies data transformations to smaller portions of the dataset, improving efficiency for large datasets like DBPedia, Amazon reviews, Ag news, and Yelp reviews.\u003c/p\u003e\n"],["\u003cp\u003eThe provided \u003ccode\u003e_data_generator\u003c/code\u003e function demonstrates how to create batches of data for use with \u003ccode\u003efit_generator\u003c/code\u003e, yielding feature and label data in manageable chunks.\u003c/p\u003e\n"],["\u003cp\u003eWhen training with \u003ccode\u003efit_generator\u003c/code\u003e, \u003ccode\u003esteps_per_epoch\u003c/code\u003e and \u003ccode\u003evalidation_steps\u003c/code\u003e need to be defined to specify the number of batches needed to cover the entire training and validation datasets, respectively, for one epoch.\u003c/p\u003e\n"]]],[],null,["# Appendix: Batch Training\n\nVery large datasets may not fit in the memory allocated to your process. In the\nprevious steps, we have set up a pipeline where we bring in the entire dataset\nin to the memory, prepare the data, and pass the working set to the training\nfunction. Instead, Keras provides an alternative training function\n([fit_generator](https://keras.io/models/sequential/#fit_generator))\nthat pulls the data in batches. This allows us to apply the transformations in\nthe data pipeline to only a small (a multiple of `batch_size`) part of the data.\nDuring our experiments, we used batching (code in GitHub) for datasets such as\n*DBPedia* , *Amazon reviews* , *Ag news* , and *Yelp reviews*.\n\nThe following code illustrates how to generate data batches and feed them to\n[fit_generator](https://keras.io/models/sequential/#fit_generator). \n\n```scdoc\ndef _data_generator(x, y, num_features, batch_size):\n \"\"\"Generates batches of vectorized texts for training/validation.\n\n # Arguments\n x: np.matrix, feature matrix.\n y: np.ndarray, labels.\n num_features: int, number of features.\n batch_size: int, number of samples per batch.\n\n # Returns\n Yields feature and label data in batches.\n \"\"\"\n num_samples = x.shape[0]\n num_batches = num_samples // batch_size\n if num_samples % batch_size:\n num_batches += 1\n\n while 1:\n for i in range(num_batches):\n start_idx = i * batch_size\n end_idx = (i + 1) * batch_size\n if end_idx \u003e num_samples:\n end_idx = num_samples\n x_batch = x[start_idx:end_idx]\n y_batch = y[start_idx:end_idx]\n yield x_batch, y_batch\n\n# Create training and validation generators.\ntraining_generator = _data_generator(\n x_train, train_labels, num_features, batch_size)\nvalidation_generator = _data_generator(\n x_val, val_labels, num_features, batch_size)\n\n# Get number of training steps. This indicated the number of steps it takes\n# to cover all samples in one epoch.\nsteps_per_epoch = x_train.shape[0] // batch_size\nif x_train.shape[0] % batch_size:\n steps_per_epoch += 1\n\n# Get number of validation steps.\nvalidation_steps = x_val.shape[0] // batch_size\nif x_val.shape[0] % batch_size:\n validation_steps += 1\n\n# Train and validate model.\nhistory = model.fit_generator(\n generator=training_generator,\n steps_per_epoch=steps_per_epoch,\n validation_data=validation_generator,\n validation_steps=validation_steps,\n callbacks=callbacks,\n epochs=epochs,\n verbose=2) # Logs once per epoch.\n```"]]