Google Cloud Platform 上大量地址验证的设计模式
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
目标
高容量地址验证教程介绍了可以使用高容量地址验证的不同场景。在本教程中,我们将向您介绍 Google Cloud Platform 中用于运行高容量地址验证的不同设计模式。
我们将首先概述如何在 Google Cloud Platform 中使用 Cloud Run、Compute Engine 或 Google Kubernetes Engine 运行高容量地址验证,以进行一次性执行。然后,我们将了解如何将此功能纳入数据流水线。
读完本文后,您应该能够很好地了解在 Google Cloud 环境中以高容量运行地址验证的不同选项。
本部分将深入探讨使用 Google Cloud Platform 进行大批量地址验证的不同设计模式。通过在 Google Cloud Platform 上运行,您可以与现有流程和数据流水线集成。
下图展示了一个参考架构,说明了如何在 Google Cloud Platform 上构建更适合一次性操作或测试的集成。

在这种情况下,我们建议您将 CSV 文件上传到 Cloud Storage 存储分区。然后,可以在 Cloud Run 环境中运行高容量地址验证脚本。不过,您可以在任何其他运行时环境(例如 Compute Engine 或 Google Kubernetes Engine)中执行它。输出 CSV 还可以上传到 Cloud Storage 存储分区。
上一部分中显示的部署模式非常适合快速测试一次性使用的高容量地址验证。不过,如果您需要经常在数据流水线中使用它,那么可以更好地利用 Google Cloud Platform 原生功能来提高其稳健性。您可以进行的一些更改包括:

从数据流水线运行脚本,作为长期运行的周期性进程
另一种常见方法是在流式数据流水线中验证一批地址,作为一种定期执行的流程。您可能还在 BigQuery 数据存储区中存储了地址。在此方法中,我们将了解如何构建需要每天/每周/每月触发的周期性数据流水线

此架构具有以下优势:
- 借助 Cloud Scheduler,可以定期进行地址验证。您可能需要每月重新验证地址,或者每月/每季度验证所有新地址。此架构有助于解决该用例。
如果客户数据位于 BigQuery 中,则可以直接在其中缓存经过验证的地址或验证标志。注意:大批量地址验证文章详细介绍了可以缓存的内容以及缓存方式
使用 Memorystore 可提供更高的恢复能力,并能够处理更多地址。此步骤为整个处理流水线添加了状态,这对于处理非常大的地址数据集是必需的。您也可以在此处使用其他数据库技术,例如 Cloud SQL[https://cloud.google.com/sql] 或 Google Cloud Platform 提供的任何其他数据库类型。不过,我们认为 Memorystore 能够完美平衡伸缩和简易性需求,因此应该是首选。
总结
通过应用此处所述的模式,您可以在 Google Cloud Platform 上针对不同的用例使用地址验证 API。
我们编写了一个开源 Python 库,可帮助您开始使用上述应用场景。您可以从计算机上的命令行调用该工具,也可以从 Google Cloud Platform 或其他云服务提供商调用该工具。
如需详细了解如何使用该库,请参阅这篇文章。
后续步骤
下载通过可靠的地址提升结账、配送和运营效率 白皮书,并观看通过地址验证提升结账、配送和运营效率 网络研讨会。
建议的延伸阅读内容:
贡献者
Google 负责维护本文。以下贡献者最初撰写了此内容。
主要作者:
Henrik Valve | 解决方案工程师
Thomas Anglaret | 解决方案工程师
Sarthak Ganguly | 解决方案工程师
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-08-27。
[null,null,["最后更新时间 (UTC):2025-08-27。"],[[["\u003cp\u003eThis guide provides various design patterns for performing high volume address validation using Google Cloud Platform, integrating with existing processes and pipelines.\u003c/p\u003e\n"],["\u003cp\u003eYou can leverage Cloud Run, Compute Engine, or Google Kubernetes Engine for one-time address validation tasks, uploading data to Cloud Storage for processing.\u003c/p\u003e\n"],["\u003cp\u003eFor recurring data pipelines, use Cloud Storage, Dataflow, and BigQuery to efficiently process and validate large address datasets regularly.\u003c/p\u003e\n"],["\u003cp\u003eTo implement a long-lasting recurring address validation process, use Memorystore for persistent storage, Cloud Scheduler for periodic triggers, and BigQuery for caching results.\u003c/p\u003e\n"],["\u003cp\u003eThis approach allows for periodic revalidation of existing addresses and validation of new ones, offering higher resiliency and the ability to process large datasets.\u003c/p\u003e\n"]]],["This content outlines how to perform high-volume address validation on Google Cloud Platform. It details running validation scripts from Cloud Run, Compute Engine, or Kubernetes Engine for one-time tasks. For data pipelines, it suggests using Cloud Storage for CSV files, Dataflow for processing, and BigQuery for caching. Recurring processes leverage Cloud Storage, Memorystore, and BigQuery, scheduled by Cloud Scheduler. It highlights an open-source Python library for implementation and provides resources for further learning.\n"],null,["Objective\n\nThe [High Volume Address Validation](https://developers.google.com/maps/architecture/high-volume-address-validation) tutorial guided you through different scenarios where high volume address validation can be used. In this tutorial, we will introduce you to different design patterns within Google Cloud Platform for running High Volume Address Validation.\n\nWe will start with an overview on running High Volume Address Validation in Google Cloud Platform with Cloud Run, Compute Engine or Google Kubernetes Engine for one time executions. We will then see how this capability can be included as part of a data pipeline.\n\nBy the end of this article you should have a good understanding of the different options for running Address Validation in high volume in your Google Cloud environment.\n| **Try now:** Visit cloud console and enable the [Address Validation API](https://console.cloud.google.com/marketplace/product/google/addressvalidation.googleapis.com?utm_source=architecture_high_volume).\n\nReference architecture on Google Cloud Platform\n\nThis section dives deeper into different design patterns for High Volume Address Validation using [Google Cloud Platform](https://cloud.google.com/). By running on Google Cloud Platform, you can integrate with your existing processes and data pipelines.\n\nRunning High Volume Address Validation one time on Google Cloud Platform\n\nShown below is a reference architecture of how to build an integration\non Google Cloud Platform which is more suitable for one off operations or testing.\n\nIn this case, we recommend uploading the CSV file to a [Cloud Storage](https://cloud.google.com/storage/docs/creating-buckets) bucket. The High Volume Address Validation script can then be run from a [Cloud Run](https://cloud.google.com/run) environment.\nHowever you can execute it any other runtime environment like [Compute Engine](https://cloud.google.com/compute) or [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine).\nThe output CSV can also be uploaded to the [Cloud Storage](https://cloud.google.com/storage/docs/creating-buckets) bucket.\n\nRunning as a Google Cloud Platform data pipeline\n\nThe deployment pattern shown in the previous section is great for quickly testing High Volume Address Validation for one time usage.\nHowever if you need to use it regularly as part of a data pipeline, then you can better leverage Google Cloud Platform native capabilities to make it more robust. Some of the changes which you can make include:\n\n- In this case, you can dump CSV files in [Cloud Storage](https://cloud.google.com/storage/docs/creating-buckets) buckets.\n- A [Dataflow](https://cloud.google.com/dataflow) job can pick up the addresses to be processed and then cache in [BigQuery](https://cloud.google.com/bigquery/docs/introduction).\n- The [Dataflow Python library](https://cloud.google.com/dataflow/docs/quickstarts/create-pipeline-python) can be be extended to have logic for High Volume Address Validation to validate the addresses from the Dataflow job.\n\nRunning the script from a data pipeline as a long lasting recurring process\n\nAnother common approach is to validate a batch of addresses as part of a streaming data pipeline as a recurring process.\nYou may also have the addresses in a bigquery datastore. In this approach we will see how to build out a recurring data pipeline (which needs to be triggered daily/weekly/monthly)\n\n- Upload the initial CSV file to a [Cloud Storage](https://cloud.google.com/storage/docs/creating-buckets) bucket.\n- Use [Memorystore](https://cloud.google.com/memorystore) as a persistent datastore to maintain intermediate state for the long running process.\n- Cache the final addresses in a [BigQuery](https://cloud.google.com/bigquery/docs/introduction) datastore.\n- Set up [Cloud Scheduler](https://cloud.google.com/scheduler) to run the script periodically.\n\nThis architecture has the following advantages:\n\n- Using [Cloud Scheduler](https://cloud.google.com/scheduler), address validation can be done periodically. You might want to revalidate the addresses on a monthly basis or validate any new addresses on a monthly/quarterly basis. This architecture helps solve that use case.\n- If customer data is in [BigQuery](https://cloud.google.com/bigquery/docs/introduction), then the validated addresses or the validation Flags can be cached directly there.\n Note: What can be cached and how is described in details in the [High Volume Address Validation article](https://developers.google.com/maps/architecture/high-volume-address-validation#caching_for_production_use)\n\n- Using [Memorystore](https://cloud.google.com/memorystore) provides higher resiliency and ability to process more addresses. This steps adds a statefulness to the whole processing pipeline which is needed for handling very large address datasets.\n Other database technologies like cloud SQL\\[https://cloud.google.com/sql\\] or any other [flavour of database](https://cloud.google.com/products/databases) which Google cloud Platform offers can be used here as well. However we believe memorystore perfectless balances the scaling and simplicity needs, thus should be the first choice.\n\nConclusion\n\nBy applying the patterns described here, you can use Address Validation API for different use cases and from different use cases on Google Cloud Platform.\n\nWe have written an open-source Python library to help you get started with the use cases described above. It can be invoked from a command line on your computer or it can be invoked from [Google Cloud Platform](https://cloud.google.com/) or other cloud providers.\n\nLearn more about how to use the library from this [article](https://developers.google.com/maps/architecture/high-volume-address-validation-library-oss).\n\nNext Steps\n\nDownload the [Improve checkout, delivery, and operations with reliable addresses](https://mapsplatform.withgoogle.com/address-validation-whitepaper/home.html?utm_source=architecture&utm_medium=website&utm_campaign=FY23-Q1-global-Maps-website-dl-GMP-AV-Whitepaper&utm_content=av_whitepaper)Whitepaper and view the [Improving checkout, delivery, and operations with Address Validation](https://mapsonair.withgoogle.com/events/improving-checkout-operations-with-address-validation-webinar?utm_source=architecture&utm_medium=website&utm_campaign=FY23-Q1-global-Maps-onlineevent-er-GMP-Improving-Ops-with-Address-Validation&utm_content=january_webinar)Webinar.\n\nSuggested further reading:\n\n- [Address Validation API Documentation](https://developers.google.com/maps/documentation/address-validation)\n- [Geocoding and Address Validation](https://developers.google.com/maps/architecture/geocoding-address-validation)\n- Explore the [Address Validation demo](https://developers.google.com/maps/documentation/address-validation/demo)\n\nContributors\n\nGoogle maintains this article. The following contributors originally wrote it. \n\nPrincipal authors:\n\n[Henrik Valve](https://www.linkedin.com/in/henrikvalve/) \\| Solutions Engineer \n\n[Thomas Anglaret](https://www.linkedin.com/in/thomas-anglaret/) \\| Solutions Engineer \n\n[Sarthak Ganguly](https://www.linkedin.com/in/sarthakganguly/) \\| Solutions Engineer"]]