對話動作已於 2023 年 6 月 13 日淘汰。詳情請參閱「
對話動作已淘汰」。
音訊的最佳做法
透過集合功能整理內容
你可以依據偏好儲存及分類內容。
本頁麵包含如何將語音資料提供給 Google 助理 API 的建議。這些指南旨在提高效率與準確性,並加快服務回應時間。
音訊預先處理
建議使用品質良好且位置適當的麥克風,以提供盡可能清晰的音訊。但是,在將音訊傳送至服務之前,對音訊進行雜訊抑制訊號處理通常會降低辨識準確性。服務主要是用來處理吵雜音訊。
為確保最佳成效:
- 將麥克風放置在盡可能靠近使用者的位置,特別是在有背景噪音時。
- 避免音訊剪輯。
- 請勿使用自動增益控制功能 (AGC)。
- 應停用所有雜訊抑制處理。
理想情況:
- 建議校正音量,確保輸入信號不會剪輯,且最高語音音訊等級約為 -20 到 -10 dBFS。
- 裝置應展現大約「平坦」的振奮與頻率特性 (+-3 dB 100 Hz 至 8000 Hz)。
- 總調和失真率應介於 100 Hz 到 8000 Hz 的 1%,且 90 dB SPL 輸入等級應低於 1%。
取樣率
如果可以,請將音訊來源的取樣率設為 16000 Hz。否則,請將 sample_rate_hertz
設為與音訊來源的原生取樣率相符 (而非重新取樣)。
影格大小
Google 助理會在從麥克風擷取即時音訊時辨識即時音訊。
音訊串流必須分割為影格,並以連續 AssistRequest
訊息傳送。任何影格大小都可接受。影格越大,效率就越佳,但會增加延遲。建議使用 100 毫秒的影格大小,做為延遲與效率之間的理想取捨。
除非另有註明,否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權,程式碼範例則為阿帕契 2.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。
上次更新時間:2025-07-26 (世界標準時間)。
[null,null,["上次更新時間:2025-07-26 (世界標準時間)。"],[[["\u003cp\u003eThis page provides recommendations for submitting speech data to the Google Assistant API for optimal performance.\u003c/p\u003e\n"],["\u003cp\u003eFor best results, use a high-quality microphone, position it close to the user, avoid audio clipping and noise reduction processing, and disable automatic gain control.\u003c/p\u003e\n"],["\u003cp\u003eIdeally, calibrate audio levels to prevent clipping, maintain a flat frequency response, and minimize harmonic distortion.\u003c/p\u003e\n"],["\u003cp\u003eSet the audio source sampling rate to 16000 Hz if possible, or match the native rate, and use a frame size of around 100 milliseconds for a balance between latency and efficiency.\u003c/p\u003e\n"]]],[],null,["# Best Practices for Audio\n\nThis page contains recommendations on how to provide speech data to the\nGoogle Assistant API. These guidelines are designed for greater efficiency\nand accuracy as well as reasonable response times from the service.\n\nAudio pre-processing\n--------------------\n\nIt's best to provide audio that is as clean as possible by using a good quality\nand well-positioned microphone. However, applying noise-reduction signal\nprocessing to the audio before sending it to the service typically reduces\nrecognition accuracy. The service is designed to handle noisy audio.\n\nFor best results:\n\n- Position the microphone as close to the user as possible, particularly when background noise is present.\n- Avoid audio clipping.\n- Do not use automatic gain control (AGC).\n- All noise reduction processing should be disabled.\n\nIdeally:\n\n- The audio level should be calibrated so that the input signal does not clip, and peak speech audio levels reach approximately -20 to -10 dBFS.\n- The device should exhibit approximately \"flat\" amplitude versus frequency characteristics (+- 3 dB 100 Hz to 8000 Hz).\n- Total harmonic distortion should be less than 1% from 100 Hz to 8000 Hz at 90 dB SPL input level.\n\nSampling rate\n-------------\n\nIf possible, set the sampling rate of the audio source to 16000 Hz. Otherwise,\nset the [`sample_rate_hertz`](/assistant/sdk/reference/rpc/google.assistant.embedded.v1alpha2#google.assistant.embedded.v1alpha2.AudioInConfig) to match the native sample rate of the audio source (instead\nof re-sampling).\n\nFrame size\n----------\n\nThe Google Assistant recognizes live audio as it is captured from a microphone.\nThe audio stream must be split into frames and sent in consecutive\n`AssistRequest` messages. Any frame size is acceptable. Larger frames are more\nefficient, but add latency. A 100-millisecond frame size is recommended as a\ngood tradeoff between latency and efficiency."]]