實際工作環境機器學習系統:部署測試
透過集合功能整理內容
你可以依據偏好儲存及分類內容。
您已準備好部署可預測獨角獸外觀的獨角獸模型!在部署時,機器學習 (ML) 管道應能順利執行、更新及提供服務。如果部署模型的操作方式,就像按下大型「Deploy」按鈕一樣簡單就好了。很抱歉,完整的機器學習系統需要進行以下測試:
- 驗證輸入資料。
- 驗證特徵工程。
- 驗證新模型版本的品質。
- 驗證放送基礎架構。
- 測試管道元件之間的整合。
許多軟體工程師都偏好採用測試驅動開發 (TDD) 做法。在 TDD 中,軟體工程師會先編寫測試,再編寫「實際」的來源程式碼。不過,在機器學習中實施 TDD 可能會有些棘手。舉例來說,在訓練模型前,您無法編寫測試來驗證損失函式。相反地,您必須先在模型開發期間找出可達成的損失,然後再針對可達成的損失測試新模型版本。
關於獨角獸模型
本節將說明獨角獸模型。以下是一些注意事項:
您使用機器學習建立分類模型,用於預測獨角獸的出現情形。您的資料集詳細列出 10,000 次獨角獸出現和 10,000 次獨角獸未出現的資料。資料集包含位置、時段、高度、溫度、濕度、樹冠覆蓋率、彩虹是否出現等多項要素。
使用可重現的訓練方式測試模型更新
或許您想繼續改善獨角獸模型。舉例來說,假設您對某個特徵進行額外的特徵工程,然後重新訓練模型,希望能獲得更好 (或至少相同) 的結果。不過,有時很難重現模型訓練。如要提高重現性,請遵循下列建議:
為隨機號碼產生器設定決定性的種子。詳情請參閱「資料產生過程中的隨機化」
請按照固定順序初始化模型元件,確保元件每次執行時都能從隨機號碼產生器取得相同的隨機號碼。機器學習程式庫通常會自動處理這項需求。
計算多次執行模型的平均值。
請使用版本控制 (即使是初步的迭代作業),這樣您就能在調查模型或管道時找出程式碼和參數。
即使您遵循這些規範,仍可能會出現其他非決定性來源。
測試對機器學習 API 的呼叫
您如何測試 API 呼叫的更新內容?您可以重新訓練模型,但這需要花費大量時間。請改為編寫單元測試,產生隨機輸入資料並執行單步梯度下降。如果這個步驟完成後沒有任何錯誤,表示 API 的任何更新都不會破壞模型。
為管道元件編寫整合測試
在 ML 管道中,某個元件的變更可能會導致其他元件發生錯誤。編寫整合測試,以端對端方式執行整個管道,檢查元件是否能相互運作。
除了持續執行整合測試外,您也應在推送新模型和新軟體版本時執行整合測試。由於整個管道的執行速度緩慢,因此會使持續整合測試更加困難。如要加快整合測試的速度,請使用部分資料或較簡單的模型進行訓練。詳細資料取決於模型和資料。為了持續涵蓋測試,您可以調整快速測試,讓這些測試能夠在每個新版本的模型或軟體中執行。同時,慢速測試會持續在背景執行。
在提供模型前驗證模型品質
將新模型版本推送至實際工作環境前,請測試以下兩種品質降級情形:
在提供服務前驗證模型與基礎架構的相容性
如果模型的更新速度比伺服器快,則模型可能會與伺服器有不同的軟體依附元件,可能會導致不相容。在沙箱版伺服器中建置模型,確保模型使用的作業會在伺服器中執行。
除非另有註明,否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權,程式碼範例則為阿帕契 2.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。
上次更新時間:2025-07-27 (世界標準時間)。
[null,null,["上次更新時間:2025-07-27 (世界標準時間)。"],[[["\u003cp\u003eDeploying a machine learning model involves validating data, features, model versions, serving infrastructure, and pipeline integration.\u003c/p\u003e\n"],["\u003cp\u003eReproducible model training involves deterministic seeding, fixed initialization order, averaging multiple runs, and using version control.\u003c/p\u003e\n"],["\u003cp\u003eIntegration tests ensure that different components of the ML pipeline work together seamlessly, and should be run continuously and for new model/software versions.\u003c/p\u003e\n"],["\u003cp\u003eBefore serving a new model, validate its quality by checking for sudden and slow degradations against previous versions and fixed thresholds.\u003c/p\u003e\n"],["\u003cp\u003eEnsure model-infrastructure compatibility by staging the model in a sandboxed server environment to avoid dependency conflicts.\u003c/p\u003e\n"]]],[],null,["# Production ML systems: Deployment testing\n\nYou're ready to deploy the unicorn model that predicts unicorn appearances!\nWhen deploying, your machine learning (ML) pipeline should run, update, and\nserve without a problem. If only deploying a model were as easy as pressing\na big **Deploy** button. Unfortunately, a full machine learning system\nrequires tests for:\n\n- Validating input data.\n- Validating feature engineering.\n- Validating the quality of new model versions.\n- Validating serving infrastructure.\n- Testing integration between pipeline components.\n\nMany software engineers favor test-driven development (TDD). In TDD, software\nengineers write tests *prior* to writing the \"real\" source code.\nHowever, TDD can be tricky in machine learning.\nFor example, before training your model, you can't write a test to validate\nthe loss. Instead, you must first discover the achievable loss during model\ndevelopment and *then* test new model versions against the achievable loss.\n\nAbout the unicorn model\n-----------------------\n\nThis section refers to the **unicorn model**. Here's what you need to know:\n\u003e You are using machine learning to build a classification model that predicts\n\u003e unicorn appearances. Your dataset details 10,000 unicorn appearances and\n\u003e 10,000 unicorn non-appearances. The dataset contains the location,\n\u003e time of day, elevation, temperature, humidity, tree cover, presence of a\n\u003e rainbow, and several other features.\n\nTest model updates with reproducible training\n---------------------------------------------\n\nPerhaps you want to continue improving your unicorn model. For example, suppose\nyou do some additional feature engineering on a certain feature and then\nretrain the model, hoping to get better (or at least the same) results.\nUnfortunately, it is sometimes difficult to reproduce model training.\nTo improve reproducibility, follow these recommendations:\n\n- Deterministically seed the random number generator.\n For details, see [randomization in data\n generation](/machine-learning/crash-course/production-ml-systems/monitoring#randomization)\n\n- Initialize model components in a fixed order to ensure the components get the\n same random number from the random number generator on every run.\n ML libraries typically handle this requirement automatically.\n\n- Take the average of several runs of the model.\n\n- Use version control, even for preliminary iterations, so that you can\n pinpoint code and parameters when investigating your model or pipeline.\n\nEven after following these guidelines, other sources of nondeterminism might\nstill exist.\n\nTest calls to machine learning API\n----------------------------------\n\nHow do you test updates to API calls? You could retrain your model, but\nthat's time intensive. Instead, write a unit test to generate random input data\nand run a single step of gradient descent. If this step completes without\nerrors, then any updates to the API probably haven't ruined your model.\n\nWrite integration tests for pipeline components\n-----------------------------------------------\n\nIn an ML pipeline, changes in one component can cause errors in other\ncomponents. Check that components work together by writing an\n**integration test** that runs the entire pipeline end-to-end.\n\nBesides running integration tests continuously, you should run integration tests\nwhen pushing new models and new software versions. The slowness of running the\nentire pipeline makes continuous integration testing harder. To run integration\ntests faster, train on a subset of the data or with a simpler model. The details\ndepend on your model and data. To get continuous coverage, you'd adjust your\nfaster tests so that they run with every new version of model or software.\nMeanwhile, your slow tests would run continuously in the background.\n\nValidate model quality before serving\n-------------------------------------\n\nBefore pushing a new model version to production, test for\nthe following two types of quality degradations:\n\n- **Sudden degradation.** A bug in the new version could cause significantly\n lower quality. Validate new versions by checking their quality\n against the previous version.\n\n- **Slow degradation.** Your test for sudden degradation might not detect a slow\n degradation in model quality over multiple versions. Instead, ensure your\n model's predictions on a validation dataset meet a fixed threshold. If your\n validation dataset deviates from live data, then update your validation\n dataset and ensure your model still meets the same quality threshold.\n\nValidate model-infrastructure compatibility before serving\n----------------------------------------------------------\n\nIf your model is updated faster than your server, then your model could have\ndifferent software dependencies from your server, potentially causing\nincompatibilities. Ensure that the operations used by the model are present in\nthe server by staging the model in a sandboxed version of the server.\n| **Key terms:**\n|\n| - [Training-serving skew](/machine-learning/glossary#training-serving-skew)\n- [Z-score normalization](/machine-learning/glossary#z-score-normalization) \n[Help Center](https://support.google.com/machinelearningeducation)"]]