随机森林
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
这是牛。
图 19. 牛。
1906 年,英国举办了举重比赛。787 名参与者猜测了一头牛的体重。单个猜测值的平均误差为 37 磅(误差为 3.1%)。不过,猜测的中位数总和与牛的实际重量 (1198 磅) 相差只有 9 磅,误差仅为 0.7%。

图 20. 各个体重猜测值的直方图。
这个轶事说明了众人之智:在某些情况下,集体意见可以提供非常准确的判断。
从数学角度来看,可以用中心极限定理对众人智慧进行建模:通俗地说,某个值与对该值进行 N 次带噪估计的平均值之间的误差平方趋于零,误差平方因子为 1/N。但是,如果变量不独立,方差会更大。
在机器学习中,集成模型是指一组模型,其预测结果会被平均(或以某种方式汇总)。如果多个模型足够不同,且各自的效果都不是太差,那么该模型集合的质量通常会优于各个模型的质量。与单个模型相比,集成模型需要更多的训练和推理时间。毕竟,您必须对多个模型(而非单个模型)执行训练和推理。
一般来说,为了让集成模型发挥最佳效果,各个模型应是独立的。举例来说,由 10 个完全相同的模型(即完全不独立)组成的集成学习模型不会比单个模型的效果更好。另一方面,强制使模型独立可能意味着使其性能变差。要想有效地进行模型集成,需要在模型独立性和其子模型质量之间取得平衡。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-07-27。
[null,null,["最后更新时间 (UTC):2025-07-27。"],[[["\u003cp\u003eThe "wisdom of the crowd" suggests that collective opinions can provide surprisingly accurate judgments, as demonstrated by a 1906 ox weight-guessing competition where the collective guess was remarkably close to the true weight.\u003c/p\u003e\n"],["\u003cp\u003eThis phenomenon can be explained by the Central Limit Theorem, which states that the average of multiple independent estimates tends to converge towards the true value.\u003c/p\u003e\n"],["\u003cp\u003eIn machine learning, ensembles leverage this principle by combining predictions from multiple models, improving overall accuracy when individual models are sufficiently diverse and reasonably accurate.\u003c/p\u003e\n"],["\u003cp\u003eWhile ensembles require more computational resources, their enhanced predictive performance often outweighs the added cost, especially when individual models are carefully selected and combined.\u003c/p\u003e\n"],["\u003cp\u003eAchieving optimal ensemble performance involves striking a balance between ensuring model independence to avoid redundant predictions and maintaining the individual quality of sub-models for overall accuracy.\u003c/p\u003e\n"]]],[],null,["# Random Forest\n\n\u003cbr /\u003e\n\nThis is an Ox.\n\n\n**Figure 19. An ox.**\n\n\u003cbr /\u003e\n\nIn 1906, a [weight judging competition was held in\nEngland](https://www.nature.com/articles/075450a0.pdf).\n787 participants guessed the weight of an ox. The median *error* of individual\nguesses was 37 lb (an error of 3.1%). However, the overall median of the\nguesses was only 9 lb away from the real weight of the ox (1198 lb), which was\nan error of only 0.7%.\n\n**Figure 20. Histogram of individual weight guesses.**\n\nThis anecdote illustrates the\n[Wisdom of the crowd](/machine-learning/glossary#wisdom_of_the_crowd): *In\ncertain situations, collective opinion provides very good judgment.*\n\nMathematically, the wisdom of the crowd can be modeled with the\n[Central limit theorem](https://wikipedia.org/wiki/Central_limit_theorem):\nInformally, the squared error between a value and the average of N noisy\nestimates of this value tends to zero with a 1/N factor.\nHowever, if the variables are not independent, the variance is greater.\n\nIn machine learning, an\n**[ensemble](/machine-learning/glossary#ensemble)** is a collection of models\nwhose predictions are averaged (or aggregated in some way). If the ensemble\nmodels are different enough without being too bad individually, the quality of\nthe ensemble is generally better than the quality of each of the individual\nmodels. An ensemble requires more training and inference time than a single\nmodel. After all, you have to perform training and inference on multiple models\ninstead of a single model.\n\nInformally, for an ensemble to work best, the individual models should be\nindependent. As an illustration, an ensemble composed of 10 of the exact same\nmodels (that is, not independent at all) won't be better than the individual\nmodel. On the other hand, forcing models to be independent could mean making\nthem worse. Effective ensembling requires finding the balance between model\nindependence and the quality of its sub-models."]]