准随机搜索
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
本单元重点介绍了近似随机搜索。
为何使用近似随机搜索?
在迭代式调优流程中,我们更倾向于使用基于低差异序列的近似随机搜索,而不是更复杂的黑盒优化工具,以便最大限度地深入了解调优问题(我们称之为“探索阶段”)。贝叶斯优化和类似工具更适合在利用阶段使用。基于随机移位的低差异序列的近似随机搜索可以被视为“抖动式打乱的网格搜索”,因为它会均匀地(但随机地)探索给定的搜索空间,并且与随机搜索相比,搜索点分布更广。
与更复杂的黑盒优化工具(例如贝叶斯优化、进化算法)相比,近似随机搜索的优势包括:
- 通过非自适应方式对搜索空间进行抽样,您可以在后续分析中更改调整目标,而无需重新运行实验。例如,我们通常希望找到在训练的任何时间点都达到最佳验证误差的试验。不过,由于近似随机搜索的非自适应特性,我们可以根据最终验证误差、训练误差或某些替代评估指标来找到最佳试验,而无需重新运行任何实验。
- 近似随机搜索的行为方式是一致且可重现的。即使搜索算法的实现发生变化,只要它保持相同的一致性属性,您应该也能重现六个月前的实验结果。如果使用复杂的贝叶斯优化软件,不同版本之间的实现可能会发生重大变化,这会使重现旧搜索变得更加困难。并非始终可以回滚到旧实现(例如,如果优化工具以服务的形式运行)。
- 它对搜索空间的统一探索有助于更轻松地推理结果以及它们可能对搜索空间的启示。例如,如果在对准确随机搜索进行遍历时,最佳点位于搜索空间的边界,则这是一个很好的信号(但并非万无一失),表明应更改搜索空间边界。但是,自适应黑盒优化算法可能会因为一些不幸的早期试验而忽略搜索空间的中间部分,即使该部分恰好包含同样出色的点也是如此,因为正是这种非均匀性,优秀的优化算法需要利用它来加快搜索速度。
- 与自适应算法不同,使用近似随机搜索(或其他非自适应搜索算法)时,并行运行不同数量的试验与顺序运行不会产生统计上不同的结果。
- 更复杂的搜索算法并不总能正确处理不可行点,尤其是在这些算法并非专为神经网络超参数调节而设计的情况下。
- 近似随机搜索方法很简单,在并行运行许多调优试验时效果尤为出色。经验之谈1:自适应算法很难战胜预算是其 2 倍的近似随机搜索,尤其是在需要并行运行许多试验时(因此,在启动新试验时,很少有机会利用之前的试验结果)。如果您不精通贝叶斯优化和其他高级黑盒优化方法,就可能无法获得这些方法原则上能够提供的好处。在现实的深度学习调优条件下,很难对先进的黑盒优化算法进行基准测试。它们是当前研究的热门领域,但对于缺乏经验的用户来说,越复杂的算法就越容易出错。精通这些方法的专家能够获得理想的结果,但在高并行性条件下,搜索空间和预算往往更重要。
不过,如果您的计算资源仅允许并行运行少量试验,而您可以依次运行许多试验,那么贝叶斯优化会变得更具吸引力,尽管这会使调优结果更难解读。
在哪里可以找到准随机搜索的实现?
开源 Vizier 提供了近似随机搜索的实现。在此 Vizier 使用示例中设置 algorithm="QUASI_RANDOM_SEARCH"
。此超参数扫描示例中提供了一种替代实现。这两种实现都会为给定的搜索空间生成 Halton 序列(旨在实现经过移位和打乱的 Halton 序列,如 Critical Hyper-Parameters: No Random, No Cry 中所建议)。
如果基于低差异序列的准随机搜索算法不可用,则可以改用伪随机均匀搜索,但效率可能会略低。在 1-2 个维度中,也可以使用网格搜索,但在更高维度中则不适用。(请参阅 Bergstra & Bengio, 2012)。
使用近似随机搜索需要进行多少次试验才能获得理想结果?
一般来说,无法确定需要进行多少次试验才能通过近似随机搜索获得结果,但您可以查看具体示例。如图 3 所示,研究中的试验次数可能会对结果产生重大影响:

图 3:在 ImageNet 上进行了 100 次试验的 ResNet-50 调优结果。使用自举例,我们模拟了不同数量的调整预算。系统会绘制每个试验预算的最佳效果的箱形图。
请注意图 3 中的以下事项:
- 抽样 6 次的四分位范围比抽样 20 次的四分位范围大得多。
- 即使进行 20 次试验,特别幸运和特别不幸的实验之间的差异也可能大于使用固定超参数对此模型在不同随机种子上重新训练之间的典型差异,对于此工作负载,验证误差率大约为 23%,差异可能在 +/- 0.1% 左右。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-07-27。
[null,null,["最后更新时间 (UTC):2025-07-27。"],[[["\u003cp\u003eQuasi-random search, akin to "jittered, shuffled grid search," offers consistent exploration of hyperparameter search spaces, aiding in insightful analysis and reproducibility.\u003c/p\u003e\n"],["\u003cp\u003eIts non-adaptive nature enables flexible post hoc analysis without rerunning experiments, unlike adaptive methods like Bayesian optimization.\u003c/p\u003e\n"],["\u003cp\u003eWhile Bayesian optimization excels with sequential trials, quasi-random search shines in high-parallelism scenarios, often outperforming adaptive methods with double its budget.\u003c/p\u003e\n"],["\u003cp\u003eQuasi-random search facilitates easier interpretation of results and identification of potential search space boundary issues.\u003c/p\u003e\n"],["\u003cp\u003eAlthough the required number of trials varies depending on the problem, studies show significant impact on results, highlighting the importance of adequate budget allocation.\u003c/p\u003e\n"]]],[],null,["# Quasi-random search\n\nThis unit focuses on quasi-random search.\n\nWhy use quasi-random search?\n----------------------------\n\nQuasi-random search (based on low-discrepancy sequences) is our preference\nover fancier blackbox optimization tools when used as part of an iterative\ntuning process intended to maximize insight into the tuning problem (what\nwe refer to as the \"exploration phase\"). Bayesian optimization and similar\ntools are more appropriate for the exploitation phase.\nQuasi-random search based on randomly shifted low-discrepancy sequences can\nbe thought of as \"jittered, shuffled grid search\", since it uniformly, but\nrandomly, explores a given search space and spreads out the search points\nmore than random search.\n\nThe advantages of quasi-random search over more sophisticated blackbox\noptimization tools (e.g. Bayesian optimization, evolutionary algorithms)\ninclude:\n\n- Sampling the search space non-adaptively makes it possible to change the tuning objective in post hoc analysis without rerunning experiments. For example, we usually want to find the best trial in terms of validation error achieved at any point in training. However, the non-adaptive nature of quasi-random search makes it possible to find the best trial based on final validation error, training error, or some alternative evaluation metric without rerunning any experiments.\n- Quasi-random search behaves in a consistent and statistically reproducible way. It should be possible to reproduce a study from six months ago even if the implementation of the search algorithm changes, as long as it maintains the same uniformity properties. If using sophisticated Bayesian optimization software, the implementation might change in an important way between versions, making it much harder to reproduce an old search. It isn't always possible to roll back to an old implementation (e.g. if the optimization tool is run as a service).\n- Its uniform exploration of the search space makes it easier to reason about the results and what they might suggest about the search space. For example, if the best point in the traversal of quasi-random search is at the boundary of the search space, this is a good (but not foolproof) signal that the search space bounds should be changed. However, an adaptive blackbox optimization algorithm might have neglected the middle of the search space because of some unlucky early trials even if it happens to contain equally good points, since it is this exact sort of non-uniformity that a good optimization algorithm needs to employ to speed up the search.\n- Running different numbers of trials in parallel versus sequentially does not produce statistically different results when using quasi-random search (or other non-adaptive search algorithms), unlike with adaptive algorithms.\n- More sophisticated search algorithms may not always handle infeasible points correctly, especially if they aren't designed with neural network hyperparameter tuning in mind.\n- Quasi-random search is simple and works especially well when many tuning trials are running in parallel. Anecdotally^[1](#fn1)^, it is very hard for an adaptive algorithm to beat a quasi-random search that has 2X its budget, especially when many trials need to be run in parallel (and thus there are very few chances to make use of previous trial results when launching new trials). Without expertise in Bayesian optimization and other advanced blackbox optimization methods, you might not achieve the benefits they are, in principle, capable of providing. It is hard to benchmark advanced blackbox optimization algorithms in realistic deep learning tuning conditions. They are a very active area of current research, and the more sophisticated algorithms come with their own pitfalls for inexperienced users. Experts in these methods are able to get good results, but in high-parallelism conditions the search space and budget tend to matter a lot more.\n\nThat said, if your computational resources only allow a small number of\ntrials to run in parallel and you can afford to run many trials in sequence,\nBayesian optimization becomes much more attractive despite making your\ntuning results harder to interpret.\n\nWhere can I find an implementation of quasi-random search?\n----------------------------------------------------------\n\n[Open-Source Vizier](https://github.com/google/vizier) has\n[an implementation of quasi-random\nsearch](https://github.com/google/vizier/blob/main/vizier/_src/algorithms/designers/quasi_random.py).\nSet `algorithm=\"QUASI_RANDOM_SEARCH\"` in [this Vizier usage\nexample](https://oss-vizier.readthedocs.io/en/latest/guides/user/running_vizier.html).\nAn alternative implementation exists [in this hyperparameter sweeps\nexample](https://github.com/mlcommons/algorithmic-efficiency/blob/main/algorithmic_efficiency/halton.py).\nBoth of these implementations generate a Halton sequence for a given search\nspace (intended to implement a shifted, scrambled Halton sequence as\nrecommended in\n[Critical Hyper-Parameters: No Random, No\nCry](https://arxiv.org/abs/1706.03200).\n\nIf a quasi-random search algorithm based on a low-discrepancy sequence is not\navailable, it is possible to substitute pseudo random uniform search instead,\nalthough this is likely to be slightly less efficient. In 1-2 dimensions,\ngrid search is also acceptable, although not in higher dimensions. (See\n[Bergstra \\& Bengio, 2012](https://www.jmlr.org/papers/v13/bergstra12a.html)).\n\nHow many trials are needed to get good results with quasi-random search?\n------------------------------------------------------------------------\n\nThere is no way to determine how many trials are needed to get\nresults with quasi-random search in general, but you can look at\nspecific examples. As Figure 3 shows, the number of trials in a study can\nhave a substantial impact on the results:\n\n**Figure 3:** ResNet-50 tuned on ImageNet with 100 trials.\nUsing bootstrapping, different amounts of tuning budget were simulated.\nBox plots of the best performances for each trial budget are plotted.\n\n\u003cbr /\u003e\n\nNotice the following about Figure 3:\n\n- The interquartile ranges when 6 trials were sampled are much larger than when 20 trials were sampled.\n- Even with 20 trials, the difference between especially lucky and unlucky studies are likely larger than the typical variation between retrains of this model on different random seeds, with fixed hyperparameters, which for this workload might be around +/- 0.1% on a validation error rate of \\~23%.\n\n*** ** * ** ***\n\n1. Ben Recht and Kevin Jamieson\n [pointed out](http://www.argmin.net/2016/06/20/hypertuning/) how strong\n 2X-budget random search is as a baseline (the\n [Hyperband paper](https://jmlr.org/papers/volume18/16-558/16-558.pdf)\n makes similar arguments), but it is certainly possible to find search\n spaces and problems where state-of-the-art Bayesian optimization\n techniques crush random search that has 2X the budget. However, in our\n experience beating 2X-budget random search gets much harder in the\n high-parallelism regime since Bayesian optimization has no opportunity to\n observe the results of previous trials. [↩](#fnref1)"]]