文本分类是一项基本的机器学习问题,可应用于各种产品。在本指南中,我们将文本分类工作流程分解为多个步骤。对于每个步骤,我们都根据特定数据集的特征建议了自定义方法。具体来说,我们会根据样本数量与每个样本的字数之比,建议一种能让您快速获得接近最佳性能的模型类型。其他步骤都是围绕此选择设计的。我们希望,遵循本指南、使用随附的代码和流程图,有助于您学习、理解并快速获得文本分类问题的初步解决方案。
总结
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-07-27。
[null,null,["最后更新时间 (UTC):2025-07-27。"],[[["\u003cp\u003eThis guide provides a structured workflow for text classification, breaking it down into manageable steps tailored to your dataset's characteristics.\u003c/p\u003e\n"],["\u003cp\u003eModel selection is guided by the ratio of samples to words per sample, helping you quickly identify a suitable model for optimal performance.\u003c/p\u003e\n"],["\u003cp\u003eThe guide includes code and a flowchart to facilitate learning, understanding, and implementing a first-cut solution for your text classification problem.\u003c/p\u003e\n"]]],[],null,["# Conclusion\n\nText classification is a fundamental machine learning problem with applications\nacross various products. In this guide, we have broken down the text\nclassification workflow into several steps. For each step, we have suggested a\ncustomized approach based on the characteristics of your specific dataset. In\nparticular, using the ratio of number of samples to the number of words per\nsample, we suggest a model type that gets you closer to the best performance\nquickly. The other steps are engineered around this choice. We hope that\nfollowing the guide, the\n[accompanying code](https://github.com/google/eng-edu/tree/master/ml/guides/text_classification),\nand the\n[flowchart](/machine-learning/guides/text-classification/step-2-5#figure-5)\nwill help you learn, understand, and get a swift first-cut solution to your text\nclassification problem."]]