AutoML:入門指南
透過集合功能整理內容
你可以依據偏好儲存及分類內容。
如果您正在考慮使用 AutoML,可能會對其運作方式和開始使用時應採取的步驟有疑問。本節將深入探討常見的 AutoML 模式、AutoML 的運作方式,以及在開始為專案使用 AutoML 前,可能需要採取哪些步驟。
AutoML 工具主要分為兩類:
- 不需編寫程式的工具通常會以網路應用程式的形式提供,讓您透過使用者介面設定及執行實驗,不必編寫任何程式碼即可為資料找出最佳模型。
- API 和 CLI 工具提供進階自動化功能,但需要更多 (有時是更多) 程式設計和機器學習專業知識。
需要編寫程式碼的 AutoML 工具比無程式碼工具更強大、更具彈性,但也較難使用。本單元著重於模型開發的無程式碼選項,但請注意,如果您需要自訂自動化功能,API 和 CLI 選項可能會有所幫助。
AutoML 工作流程
讓我們逐步瞭解典型的機器學習工作流程,並瞭解使用 AutoML 時的運作方式。工作流程中的高階步驟與您在自訂訓練中使用的步驟相同,主要差異在於 AutoML 會為您處理部分任務。
問題定義
任何機器學習工作流程的第一步,都是定義問題。使用 AutoML 時,請確認所選工具能支援 ML 專案的目標。大多數 AutoML 工具都支援各種監督式機器學習演算法和輸入資料類型。
如要進一步瞭解問題設定,請參閱「機器學習問題設定簡介」模組。
資料收集
您必須先將資料收集到單一資料來源,才能開始使用 AutoML 工具。請查看產品說明文件,確認工具支援的資料來源、資料集內的資料類型和資料集大小。
資料準備
資料準備是 AutoML 工具可協助處理的領域,但沒有任何工具可以自動完成所有工作,因此您必須先完成一些工作,才能將資料匯入工具。為 AutoML 準備資料的方式,與手動訓練模型時所需的準備方式類似。如要進一步瞭解如何準備訓練資料,請參閱「資料準備」一節。
如要進一步瞭解如何準備資料,請參閱「使用數值資料」和「使用分類資料」模組。
匯入資料進行 AutoML 訓練前,您必須完成下列步驟:
模型開發 (使用無程式碼 AutoML)
AutoML 會在訓練期間為您完成工作。不過,您必須先設定實驗,才能開始訓練。如要設定 AutoML 訓練執行作業,通常需要指定下列高層級步驟:
匯入資料
如要匯入資料,請指定資料來源。在匯入過程中,AutoML 工具會為每個資料值指派語義資料類型。
分析資料
AutoML 產品通常會提供工具,可在訓練前後分析資料集。建議您在開始執行 AutoML 前,先使用這些分析工具來瞭解及驗證資料。
精進資料
AutoML 工具通常會提供機制,協助您在匯入資料後及訓練前,對資料進行精修。以下是您可能要完成的幾項工作,以便精進資料:
語意檢查:在匯入期間,AutoML 工具會嘗試判斷每個特徵的正確語意類型,但這只是一種猜測。請檢查所有地圖項目的指定類型,並在必要時變更。
舉例來說,您可能會將郵遞區號儲存為資料庫中某個欄位的數字。大多數 AutoML 系統會將資料偵測為連續數值資料。這對郵遞區號來說是不正確的,使用者可能會想將此地圖項目欄的語意類型變更為「類別」,而不是「連續」。
轉換:部分工具可讓使用者在精進過程中自訂資料轉換。有時,當資料集含有需要轉換或結合的潛在預測功能,AutoML 工具可能無法自行判斷,這時就需要使用者協助。
舉例來說,假設您使用住宅資料集來預測房屋的售價。假設有一個代表房屋資訊的欄位,名為 description
,而您想使用這項資料建立名為 description_length
的新欄位。部分 AutoML 系統提供使用自訂轉換的方法。在本範例中,可能會有 LENGTH
函式,用來產生新的說明長度功能,如下所示:LENGTH(description)
。
設定 AutoML 執行參數
執行訓練實驗前的最後一個步驟,就是選擇幾項設定,告訴工具如何訓練模型。雖然每個 AutoML 工具都有專屬的設定選項,但以下列舉幾項您可能需要完成的重要設定工作:
- 選取您要解決的機器學習問題類型。例如,您是要解決分類問題還是迴歸問題?
- 選取資料集中的標籤欄。
- 選取要用來訓練模型的功能組合。
- 選取 AutoML 在模型搜尋作業中考量的機器學習演算法組合。
- 選取 AutoML 用來選擇最佳模型的評估指標。
設定 AutoML 實驗後,您就可以開始訓練作業。訓練作業可能需要一段時間才能完成 (大約數小時)。
評估模型
訓練完成後,您可以使用 AutoML 產品提供的工具檢視結果,這些工具可協助您:
- 檢查特徵重要性指標,評估特徵。
- 檢查用於建構模型的架構和超參數,瞭解模型。
- 使用輸出模型訓練期間收集到的圖表和指標,評估頂層模型成效。
正式化
雖然這不在本單元涵蓋的範圍內,但某些 AutoML 系統可協助您測試及部署模型。
重新訓練模型
您可能需要使用新資料重新訓練模型。這可能發生在您評估 AutoML 訓練執行作業後,或是模型在實際工作環境中運作一段時間後。無論是哪種情況,AutoML 系統都能協助重新訓練。在 AutoML 執行完畢後,您可能會再查看資料,並使用改善後的資料集重新訓練。
後續步驟
恭喜您完成本單元!
我們鼓勵您按照自己的步調和興趣,探索各種 MLCC 模組。如果您想按照建議的順序學習,建議您接著前往下列模組:機器學習公平性。
除非另有註明,否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權,程式碼範例則為阿帕契 2.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。
上次更新時間:2025-02-26 (世界標準時間)。
[null,null,["上次更新時間:2025-02-26 (世界標準時間)。"],[[["\u003cp\u003eAutoML automates the process of developing machine learning models, requiring minimal coding for some tools while others offer more flexibility through APIs and CLIs for advanced users.\u003c/p\u003e\n"],["\u003cp\u003eThe AutoML workflow follows similar steps to traditional machine learning, including problem definition, data gathering, preparation, model development, evaluation, and potential retraining.\u003c/p\u003e\n"],["\u003cp\u003eData preparation remains crucial for AutoML, involving labeling, cleaning, formatting, and potentially feature transformations to ensure optimal model training.\u003c/p\u003e\n"],["\u003cp\u003eNo-code AutoML tools guide users through model development with steps like data import, analysis, refinement, and configuration of training parameters before initiating the automated training process.\u003c/p\u003e\n"],["\u003cp\u003eAfter training, users can evaluate model performance, feature importance, and underlying architecture, with some AutoML systems even supporting model deployment and retraining.\u003c/p\u003e\n"]]],[],null,["# AutoML: Getting started\n\nIf you are thinking about using AutoML, you may have questions about how it\nworks and what steps you should take to get started. This section dives deeper\ninto common AutoML patterns, explores how AutoML works, and examines what steps\nyou may need to take before you begin using AutoML for your project.\n\nAutoML tools\n------------\n\nAutoML tools fall into two main categories:\n\n- **Tools that require no coding** typically take the form of web applications that let you configure and run experiments through a user interface to find the best model for your data without writing any code.\n- **API and CLI tools** provide advanced automation features, but require more (sometimes significantly more) programming and ML expertise.\n\nAutoML tools that require coding can be more powerful and more flexible than\nno-code tools, but they can also be more difficult to use. This module focuses\non the no-code options for model development, but be aware that API and CLI\noptions can help if you require customized automation.\n\nAutoML workflow\n---------------\n\nLet's walk through a typical ML workflow and see how things work when you use\nAutoML. The high level steps in the workflow are the same as those you use for\ncustom training; the main difference is that AutoML handles some tasks for you.\n\n### Problem definition\n\nThe first step in any ML workflow is to define your problem. When you are using\nAutoML, ensure that the tool you choose can support the\nobjectives of your ML project. Most AutoML tools support a variety of supervised\nmachine learning algorithms and input data types.\n\nFor more information about problem framing, take a look at the module on\n[Introduction to Machine Learning Problem Framing](/machine-learning/problem-framing).\n\n### Data gathering\n\nBefore you can start working with an AutoML tool, you need to collect your data\ninto a single data source. Check the product documentation to make sure that\nyour tool supports: your data source, the data types in your dataset, the size\nof your dataset.\n\n\u003cbr /\u003e\n\n### Data preparation\n\nData preparation is an area where AutoML tools can help you, but no\ntool can do everything automatically, so expect to do some work before you\ncan import your data into the tool. Data preparation for AutoML is similar to\nwhat you would need to do to train a model manually. If you need to know more\nabout how to prepare your data for training, take a look at the Data Preparation\nsection.\n\nFor more information on preparing your data, see the\n[working with numerical data](/machine-learning/crash-course/numerical-data)\nand\n[working with categorical data](/machine-learning/crash-course/categorical-data)\nmodules.\n\n**Before importing your data** for AutoML training, you need to complete these\nsteps:\n\n- **Label your data**\n\n Every example in your dataset needs a label.\n- **Clean and format data**\n\n Real-world data tends to be messy, so expect to clean your data before using\n it. Even with AutoML you need to determine the best treatments for your\n particular dataset and problem. This might require some exploration and\n potentially multiple AutoML runs before you get the best results.\n- **Perform feature transformations**\n\n Some AutoML tools handle certain feature transformations for you. But, if\n the tool you are using does not support a feature transform that you need or\n does not support it well, you may need to perform the transformations ahead\n of time.\n\n### Model development (with a no-code AutoML)\n\nAutoML does the work for you during training. However, before you start\ntraining, you need to configure your experiment. To set up an AutoML training\nrun, you typically need to specify these high level steps:\n\n1. **Import your data**\n\n To import your data, specify your data source. During the import\n process, the AutoML tool assigns a semantic data type to each data value.\n2. **Analyze your data**\n\n AutoML products usually provide tools to analyze your dataset before and\n after training. As a best practice, you may want to use these analysis tools\n to understand and verify your data before starting an AutoML run.\n3. **Refine your data**\n\n AutoML tools often provide mechanisms to help you refine your data after\n importing and before training. Here are a few tasks you may want to complete\n to refine your data:\n - **Semantic Checking:** During import, AutoML tools try to determine the\n correct semantic type for each feature, but these are only guesses.\n You should check the types designated to all features and change them\n if they were assigned incorrectly.\n\n For example, you may have postal codes stored as numbers in a column in\n your database. Most AutoML systems would detect the data as continuous\n numeric data. This would be incorrect for a postal code and the user\n would probably want to change the semantic type to categorical rather\n than continuous for this feature column.\n - **Transformations:** Some tools allow users to customize data\n transformations as part of the refinement process. Sometimes this is\n needed when a dataset has potentially predictive features that need to\n be transformed or combined in a way that is difficult for AutoML tools\n to determine without help.\n\n For example, consider a housing dataset that you are using to predict\n the sale price for a house. Suppose there is feature that represents the\n description for a house listing called `description` and you would\n like to use this data to create a new feature called\n `description_length`. Some AutoML systems offer ways to use custom\n transformations. For this example, there might be a `LENGTH` function\n to generate a new description length feature like this:\n `LENGTH(description)`.\n4. **Configure AutoML run parameters**\n\n The last step before running your training experiment is to choose a few\n configuration settings to tell the tool how you want it to train your model.\n Though each AutoML tool has its own unique set of configuration options,\n here are a few of the significant configuration tasks you may need to\n complete:\n - Select the ML problem type you plan to solve. For example, are you solving a classification or regression problem?\n - Select which column in your dataset is the label.\n - Select the set of features to use to train the model.\n - Select the set of ML algorithms AutoML considers in the model search.\n - Select the evaluation metric AutoML uses to choose the best model.\n\nAfter configuring your AutoML experiment, you are ready to start the training\nrun. Training may take a while to complete (on the order of hours).\n\n### Evaluate model\n\nAfter training, you can examine the results by using the tools your AutoML\nproduct provides to help you:\n\n- Evaluate your features by examining feature importance metrics.\n- Understand your model by examining the architecture and hyperparameters used to build it.\n- Evaluate top-level model performance with plots and metrics collected during training for the output model.\n\n### Productionization\n\nThough it is outside the scope of this module, some AutoML systems can help you\ntest and deploy your model.\n\n### Retrain model\n\nYou might need to retrain the model with new data. This might happen after you\nevaluate your AutoML training run or after your model is in production for some\ntime. Either way, AutoML systems can help with retraining too. It is not\nuncommon to take another look at your data after an AutoML run, and retrain with\nan improved dataset.\n\n\u003cbr /\u003e\n\n| **Key terms:**\n|\n| - [AutoML](/machine-learning/glossary#automl)\n\nWhat's next\n-----------\n\nCongratulations on finishing this module!\n\nWe encourage you to explore the various [MLCC modules](/machine-learning/crash-course)\nat your own pace and interest. If you'd like to follow a recommended order,\nwe suggest that you move to the following module next:\n**[ML Fairness](/machine-learning/crash-course/fairness)**.\n\n*** ** * ** ***\n\n[Help Center](https://support.google.com/machinelearningeducation)"]]