AutoML:使用入门
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
如果您正在考虑使用 AutoML,可能会对其运作方式以及开始使用时应执行哪些步骤存有疑问。本部分将深入探讨常见的 AutoML 模式,探索 AutoML 的工作原理,并审视您在开始为项目使用 AutoML 之前可能需要执行的步骤。
AutoML 工具主要分为两类:
- 无需编码的工具通常采用 Web 应用的形式,可让您通过界面配置和运行实验,从而为您的数据找到最合适的模型,而无需编写任何代码。
- API 和 CLI 工具提供高级自动化功能,但需要更多(有时要多得多)的编程和机器学习专业知识。
需要编码的 AutoML 工具可能比无代码工具更强大、更灵活,但也可能更难使用。本单元重点介绍了无代码模型开发选项,但请注意,如果您需要自定义自动化,API 和 CLI 选项会很有帮助。
AutoML 工作流
我们来详细了解一下典型的机器学习工作流,看看使用 AutoML 时会发生什么情况。该工作流中的高级步骤与您用于自定义训练的步骤相同;主要区别在于 AutoML 会为您处理一些任务。
问题定义
任何机器学习工作流的第一步都是定义问题。使用 AutoML 时,请确保您选择的工具可以支持机器学习项目的目标。大多数 AutoML 工具都支持各种监督式机器学习算法和输入数据类型。
如需详细了解问题框架,请参阅机器学习问题框架简介模块。
数据收集
在开始使用 AutoML 工具之前,您需要将数据收集到单个数据源中。查看产品文档,确保您的工具支持您的数据源、数据集中的数据类型和数据集的大小。
数据准备
AutoML 工具可以帮助您完成数据准备工作,但没有任何工具可以自动完成所有工作,因此您需要先完成一些工作,然后才能将数据导入到该工具中。为 AutoML 准备数据与手动训练模型所需执行的操作类似。如需详细了解如何准备数据以进行训练,请参阅“数据准备”部分。
如需详细了解如何准备数据,请参阅使用数值数据和使用分类数据模块。
在导入数据以进行 AutoML 训练之前,您需要完成以下步骤:
模型开发(使用无代码 AutoML)
AutoML 会在训练期间为您完成工作。不过,在开始训练之前,您需要先配置实验。如需设置 AutoML 训练作业,您通常需要指定以下概要步骤:
导入数据
如需导入数据,请指定数据源。在导入过程中,AutoML 工具会为每个数据值分配一个语义数据类型。
分析数据
AutoML 产品通常会提供在训练前后分析数据集的工具。最佳实践是,在开始运行 AutoML 之前,使用这些分析工具了解和验证数据。
优化数据
AutoML 工具通常会提供一些机制,帮助您在导入数据后和训练前优化数据。以下是您可能需要完成的一些任务,以优化数据:
语义检查:在导入过程中,AutoML 工具会尝试确定每个特征的正确语义类型,但这些只是猜测。您应检查为所有地图项指定的类型,如果分配的类型不正确,请进行更改。
例如,您可以将邮政编码存储为数据库中某个列中的数字。大多数 AutoML 系统都会将此类数据检测为连续数值数据。对于邮政编码,这种做法是不正确的,用户可能希望将此地图项列的语义类型更改为分类类型,而不是连续类型。
转换:某些工具允许用户在优化流程中自定义数据转换。有时,当数据集包含可能具有预测性的特征,且需要以 AutoML 工具难以确定的方式进行转换或组合时,就需要使用此方法。
例如,假设您有一个房屋数据集,用于预测房屋的销售价格。假设有一个名为 description
的地产信息地图项,表示房源的说明,并且您希望使用此数据创建一个名为 description_length
的新地图项。某些 AutoML 系统提供了使用自定义转换的方法。在本示例中,可能有一个 LENGTH
函数用于生成新的说明长度特征,如下所示:LENGTH(description)
。
配置 AutoML 运行参数
在运行训练实验之前,最后一步是选择一些配置设置,以告知该工具您希望其如何训练模型。虽然每种 AutoML 工具都有自己的一组独特配置选项,但以下是您可能需要完成的一些重要配置任务:
- 选择您打算解决的机器学习问题类型。例如,您是要解决分类问题还是回归问题?
- 选择数据集中哪一列是标签。
- 选择要用于训练模型的一组特征。
- 选择 AutoML 在模型搜索中要考虑的一组机器学习算法。
- 选择 AutoML 用于选择最佳模型的评估指标。
配置 AutoML 实验后,您就可以开始运行训练了。训练可能需要一段时间才能完成(大约需要数小时)。
评估模型
训练完成后,您可以使用 AutoML 产品提供的工具来检查结果,以便:
- 通过检查特征重要性指标来评估特征。
- 通过检查用于构建模型的架构和超参数来了解模型。
- 使用输出模型训练期间收集的图表和指标评估顶级模型性能。
生产化
虽然这超出了本单元的范围,但某些 AutoML 系统可以帮助您测试和部署模型。
重新训练模型
您可能需要使用新数据重新训练模型。在评估 AutoML 训练运行作业或模型在生产环境中运行一段时间后,就可能会出现这种情况。无论是哪种情况,AutoML 系统都可以帮助您重新训练。在 AutoML 运行后,再次查看数据并使用经过改进的数据集进行重新训练,这种做法并不少见。
后续步骤
恭喜您完成本单元!
我们鼓励您根据自己的兴趣和进度,探索各种 MLCC 模块。如果您想按照建议的顺序学习,我们建议您接下来学习以下模块:机器学习公平性。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-02-26。
[null,null,["最后更新时间 (UTC):2025-02-26。"],[[["\u003cp\u003eAutoML automates the process of developing machine learning models, requiring minimal coding for some tools while others offer more flexibility through APIs and CLIs for advanced users.\u003c/p\u003e\n"],["\u003cp\u003eThe AutoML workflow follows similar steps to traditional machine learning, including problem definition, data gathering, preparation, model development, evaluation, and potential retraining.\u003c/p\u003e\n"],["\u003cp\u003eData preparation remains crucial for AutoML, involving labeling, cleaning, formatting, and potentially feature transformations to ensure optimal model training.\u003c/p\u003e\n"],["\u003cp\u003eNo-code AutoML tools guide users through model development with steps like data import, analysis, refinement, and configuration of training parameters before initiating the automated training process.\u003c/p\u003e\n"],["\u003cp\u003eAfter training, users can evaluate model performance, feature importance, and underlying architecture, with some AutoML systems even supporting model deployment and retraining.\u003c/p\u003e\n"]]],[],null,["# AutoML: Getting started\n\nIf you are thinking about using AutoML, you may have questions about how it\nworks and what steps you should take to get started. This section dives deeper\ninto common AutoML patterns, explores how AutoML works, and examines what steps\nyou may need to take before you begin using AutoML for your project.\n\nAutoML tools\n------------\n\nAutoML tools fall into two main categories:\n\n- **Tools that require no coding** typically take the form of web applications that let you configure and run experiments through a user interface to find the best model for your data without writing any code.\n- **API and CLI tools** provide advanced automation features, but require more (sometimes significantly more) programming and ML expertise.\n\nAutoML tools that require coding can be more powerful and more flexible than\nno-code tools, but they can also be more difficult to use. This module focuses\non the no-code options for model development, but be aware that API and CLI\noptions can help if you require customized automation.\n\nAutoML workflow\n---------------\n\nLet's walk through a typical ML workflow and see how things work when you use\nAutoML. The high level steps in the workflow are the same as those you use for\ncustom training; the main difference is that AutoML handles some tasks for you.\n\n### Problem definition\n\nThe first step in any ML workflow is to define your problem. When you are using\nAutoML, ensure that the tool you choose can support the\nobjectives of your ML project. Most AutoML tools support a variety of supervised\nmachine learning algorithms and input data types.\n\nFor more information about problem framing, take a look at the module on\n[Introduction to Machine Learning Problem Framing](/machine-learning/problem-framing).\n\n### Data gathering\n\nBefore you can start working with an AutoML tool, you need to collect your data\ninto a single data source. Check the product documentation to make sure that\nyour tool supports: your data source, the data types in your dataset, the size\nof your dataset.\n\n\u003cbr /\u003e\n\n### Data preparation\n\nData preparation is an area where AutoML tools can help you, but no\ntool can do everything automatically, so expect to do some work before you\ncan import your data into the tool. Data preparation for AutoML is similar to\nwhat you would need to do to train a model manually. If you need to know more\nabout how to prepare your data for training, take a look at the Data Preparation\nsection.\n\nFor more information on preparing your data, see the\n[working with numerical data](/machine-learning/crash-course/numerical-data)\nand\n[working with categorical data](/machine-learning/crash-course/categorical-data)\nmodules.\n\n**Before importing your data** for AutoML training, you need to complete these\nsteps:\n\n- **Label your data**\n\n Every example in your dataset needs a label.\n- **Clean and format data**\n\n Real-world data tends to be messy, so expect to clean your data before using\n it. Even with AutoML you need to determine the best treatments for your\n particular dataset and problem. This might require some exploration and\n potentially multiple AutoML runs before you get the best results.\n- **Perform feature transformations**\n\n Some AutoML tools handle certain feature transformations for you. But, if\n the tool you are using does not support a feature transform that you need or\n does not support it well, you may need to perform the transformations ahead\n of time.\n\n### Model development (with a no-code AutoML)\n\nAutoML does the work for you during training. However, before you start\ntraining, you need to configure your experiment. To set up an AutoML training\nrun, you typically need to specify these high level steps:\n\n1. **Import your data**\n\n To import your data, specify your data source. During the import\n process, the AutoML tool assigns a semantic data type to each data value.\n2. **Analyze your data**\n\n AutoML products usually provide tools to analyze your dataset before and\n after training. As a best practice, you may want to use these analysis tools\n to understand and verify your data before starting an AutoML run.\n3. **Refine your data**\n\n AutoML tools often provide mechanisms to help you refine your data after\n importing and before training. Here are a few tasks you may want to complete\n to refine your data:\n - **Semantic Checking:** During import, AutoML tools try to determine the\n correct semantic type for each feature, but these are only guesses.\n You should check the types designated to all features and change them\n if they were assigned incorrectly.\n\n For example, you may have postal codes stored as numbers in a column in\n your database. Most AutoML systems would detect the data as continuous\n numeric data. This would be incorrect for a postal code and the user\n would probably want to change the semantic type to categorical rather\n than continuous for this feature column.\n - **Transformations:** Some tools allow users to customize data\n transformations as part of the refinement process. Sometimes this is\n needed when a dataset has potentially predictive features that need to\n be transformed or combined in a way that is difficult for AutoML tools\n to determine without help.\n\n For example, consider a housing dataset that you are using to predict\n the sale price for a house. Suppose there is feature that represents the\n description for a house listing called `description` and you would\n like to use this data to create a new feature called\n `description_length`. Some AutoML systems offer ways to use custom\n transformations. For this example, there might be a `LENGTH` function\n to generate a new description length feature like this:\n `LENGTH(description)`.\n4. **Configure AutoML run parameters**\n\n The last step before running your training experiment is to choose a few\n configuration settings to tell the tool how you want it to train your model.\n Though each AutoML tool has its own unique set of configuration options,\n here are a few of the significant configuration tasks you may need to\n complete:\n - Select the ML problem type you plan to solve. For example, are you solving a classification or regression problem?\n - Select which column in your dataset is the label.\n - Select the set of features to use to train the model.\n - Select the set of ML algorithms AutoML considers in the model search.\n - Select the evaluation metric AutoML uses to choose the best model.\n\nAfter configuring your AutoML experiment, you are ready to start the training\nrun. Training may take a while to complete (on the order of hours).\n\n### Evaluate model\n\nAfter training, you can examine the results by using the tools your AutoML\nproduct provides to help you:\n\n- Evaluate your features by examining feature importance metrics.\n- Understand your model by examining the architecture and hyperparameters used to build it.\n- Evaluate top-level model performance with plots and metrics collected during training for the output model.\n\n### Productionization\n\nThough it is outside the scope of this module, some AutoML systems can help you\ntest and deploy your model.\n\n### Retrain model\n\nYou might need to retrain the model with new data. This might happen after you\nevaluate your AutoML training run or after your model is in production for some\ntime. Either way, AutoML systems can help with retraining too. It is not\nuncommon to take another look at your data after an AutoML run, and retrain with\nan improved dataset.\n\n\u003cbr /\u003e\n\n| **Key terms:**\n|\n| - [AutoML](/machine-learning/glossary#automl)\n\nWhat's next\n-----------\n\nCongratulations on finishing this module!\n\nWe encourage you to explore the various [MLCC modules](/machine-learning/crash-course)\nat your own pace and interest. If you'd like to follow a recommended order,\nwe suggest that you move to the following module next:\n**[ML Fairness](/machine-learning/crash-course/fairness)**.\n\n*** ** * ** ***\n\n[Help Center](https://support.google.com/machinelearningeducation)"]]