使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
变量重要性
变量重要性(也称为特征重要性)是一个得分,表示某个特征对模型的重要程度。例如,如果给定模型有两个输入特征“f1”和“f2”,并且变量重要性为 {f1=5.8, f2=2.5},则特征“f1”对模型而言比特征“f2”更“重要”。与其他机器学习模型一样,变量重要性是一种简单的方式,可帮助您了解决策树的运作方式。
您可以将与模型无关的变量重要性(例如排列变量重要性)应用于决策树。
决策树还有特定的变量重要性,例如:
- 使用给定变量的拆分得分的总和。
- 具有给定变量的节点数量。
- 在所有树路径中,特征首次出现的平均深度。
变量重要性可能会因以下因素而异:
此外,变量重要性还提供了以下不同类型的信息:
例如,包含特定特征的条件数量表明决策树对此特定特征的关注程度,这可能表明变量的重要性。毕竟,如果某个特征不重要,学习算法就不会在多种情况下使用该特征。不过,如果同一特征出现在多个条件中,也可能表示模型尝试但未能概括出特征的模式。例如,如果某个特征只是一个示例标识符,没有可用于推广的信息,就可能会出现这种情况。
另一方面,如果排列变量重要性较高,则表示移除某个特征会降低模型的准确性,这也反映了变量的重要性。但是,如果模型足够稳健,移除任何一个特征可能不会对模型造成影响。
由于不同的变量重要性可以反映模型的不同方面,因此同时查看多个变量重要性会很有帮助。例如,如果某个特征在所有变量重要性方面都很重要,那么该特征可能很重要。再举一个例子,如果某个特征的“节点数”变量重要性较高,而“排列”变量重要性较低,则该特征可能难以推广,并且可能会降低模型质量。
在 YDF 中,您可以通过调用
model.describe()
并查看“变量重要性”标签页来查看模型的变量重要性。如需了解详情,请参阅
“模型理解”教程。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-02-25。
[null,null,["最后更新时间 (UTC):2025-02-25。"],[[["\u003cp\u003eVariable importance, also known as feature importance, is a score indicating how crucial a feature is to a model's predictions.\u003c/p\u003e\n"],["\u003cp\u003eDecision trees have specific variable importances like the sum of split scores, number of nodes using a variable, and average depth of a feature's first occurrence.\u003c/p\u003e\n"],["\u003cp\u003eDifferent variable importance metrics provide insights into the model, dataset, and training process, such as feature usage patterns and generalization abilities.\u003c/p\u003e\n"],["\u003cp\u003eExamining multiple variable importances together offers a comprehensive understanding of feature relevance and potential model weaknesses.\u003c/p\u003e\n"],["\u003cp\u003eYDF allows users to access variable importance through the \u003ccode\u003emodel.describe()\u003c/code\u003e function and its "variable importance" tab for model understanding.\u003c/p\u003e\n"]]],[],null,["\u003cbr /\u003e\n\nVariable importances\n--------------------\n\n**Variable importance** (also known as **feature importance**) is a score that\nindicates how \"important\" a feature is to the model. For example, if for a given\nmodel with two input features \"f1\" and \"f2\", the variable importances are\n{f1=5.8, f2=2.5}, then the feature \"f1\" is more \"important\" to the model than\nfeature \"f2\". As with other machine learning models, variable importance is a\nsimple way to understand how a decision tree works.\n\nYou can apply model agnostic variable importances such as [permutation variable\nimportances](/machine-learning/glossary#permutation-variable-importances),\nto decision trees.\n\nDecision trees also have specific variable importances, such as:\n\n- The sum of the split score with a given variable.\n- The number of nodes with a given variable.\n- The average depth of the first occurrence of a feature across all the tree paths.\n\nVariable importances can differ by qualities such as:\n\n- semantics\n- scale\n- properties\n\nFurthermore, variable importances provide different types of information about:\n\n- the model\n- the dataset\n- the training process\n\nFor example, the number of conditions containing a specific feature indicates\nhow much a decision tree is looking at this specific feature, which might\nindicate variable importance. After all, the learning algorithm would not have\nused a feature in multiple conditions if it did not matter. However, the same\nfeature appearing in multiple conditions might also indicate that a model is\ntrying but failing to generalize the pattern of a feature. For example, this\ncan happen when a feature is just an example identifier with no information\nto generalize.\n\nOn the other hand, a high value for a high permutation variable importance\nindicates that removing a feature hurts the model, which is an indication of\nvariable importance. However, if the model is robust, removing any one feature\nmight not hurt the model.\n\nBecause different variable importances inform about different aspects of the\nmodels, looking at several variable importances at the same time is informative.\nFor example, if a feature is important according to all the variable\nimportances, this feature is likely important. As another example, if a feature\nhas a high \"number of nodes\" variable importance and a small \"permutation\"\nvariable importance, then this feature might be hard to generalize and can\nhurt the model quality. \nYDF Code\nIn YDF, you can see the variable importance of a model by calling `model.describe()` and looking at the \"variable importance\" tab. See the [Model understanding tutorial](https://ydf.readthedocs.io/en/latest/tutorial/model_understanding) for more details."]]