整合機器學習 API

1. 總覽

本程式碼研究室將快速導覽幾項機器學習 API。您會使用：

Cloud Vision：瞭解圖片內容
Cloud Speech-to-Text，將音訊轉錄為文字
Cloud Translation：將任意字串翻譯成任何支援的語言
使用 Cloud Natural Language 從文字中擷取資訊

建構項目

您將建構一條管道，比較錄音和圖片，並判斷兩者是否相關。以下搶先看看如何完成這項操作：

課程內容

如何使用 Vision、Speech-to-Text、Translation 和 Natural Language API
程式碼範例的取得方式

軟硬體需求

Chrome 或 Firefox 瀏覽器
Python 的基本知識

2. 設定和需求

自行設定環境

登入 Cloud 控制台，建立新專案或重複使用現有專案。如果沒有 Gmail 或 Google Workspace 帳戶，請建立帳戶。

請記住專案 ID，這是所有 Google Cloud 專案中不重複的名稱 (上述名稱已遭占用，因此不適用於您，抱歉！)。本程式碼研究室稍後會將其稱為 PROJECT_ID。

接著，您必須在 Cloud 控制台中啟用帳單，才能使用 Google Cloud 資源。

完成本程式碼研究室的練習不會產生任何費用，或只會產生少量費用。請務必按照「清除」一節中的任何操作說明關閉資源，以免在本教學課程結束後產生費用。Google Cloud 新使用者可參加價值$300 美元的免費試用計畫。

啟用 API

您可以按一下這個連結來啟用所有必要的 API。完成後，請忽略設定驗證的說明，我們稍後會進行設定。或者，您也可以個別啟用各項 API。方法是按一下畫面左上方的選單圖示。

從下拉式選單中選取「API 和服務」，然後按一下「資訊主頁」。

按一下「啟用 API 和服務」。

接著，在搜尋框中搜尋「視覺」。按一下「Google Cloud Vision API」：

按一下「啟用」來啟用 Cloud Vision API：

稍待片刻，等待按鈕重新啟用。啟用後，畫面會顯示以下內容：

重複執行相同程序，啟用 Cloud Speech、Cloud Translation 和 Cloud Natural Language API。

Cloud Shell

Google Cloud Shell 是在雲端執行的指令列環境，這種以 Debian 為基礎的虛擬機器，搭載各種您需要的開發工具 (包括 gcloud、bq、git 等等)，而且主目錄提供 5 GB 的永久儲存空間。我們將使用 Cloud Shell 建立機器學習 API 的要求。

如要開始使用 Cloud Shell，請按一下標題列右上角的「啟用 Google Cloud Shell」圖示 Screen Shot 2015-06-08 at 5.30.32 PM.png

系統會在控制台底部的新頁框中開啟 Cloud Shell 工作階段，並顯示指令列提示。等待 user@project:~$ 提示顯示。

選用：程式碼編輯器

視您對指令列的熟悉程度，您可能需要點選 Cloud Shell 列右上角的「啟動程式碼編輯器」圖示

服務帳戶

您必須具備服務帳戶才能進行驗證。如要建立服務帳戶，請將 [NAME] 換成所需的服務帳戶名稱，然後在 Cloud Shell 中執行下列指令：

gcloud iam service-accounts create [NAME]

現在您需要產生金鑰，才能使用該服務帳戶。將 [FILE_NAME] 替換為所需的金鑰名稱，將 [NAME] 替換為上述服務帳戶名稱，並將 [PROJECT_ID] 替換為專案 ID。下列指令會建立並下載金鑰，並將其命名為 [FILE_NAME].json：

gcloud iam service-accounts keys create [FILE_NAME].json --iam-account [NAME]@[PROJECT_ID].iam.gserviceaccount.com

如要使用服務帳戶，請將 GOOGLE_APPLICATION_CREDENTIALS 變數設為金鑰路徑。如要執行這項操作，請先取代 [PATH_TO_FILE] 和 [FILE_NAME]，然後執行下列指令：

export GOOGLE_APPLICATION_CREDENTIALS=[PATH_TO_FILE]/[FILE_NAME].json

3. Cloud Vision

Python 用戶端

您需要 Cloud Vision 的 Python 用戶端。如要安裝，請在 Cloud Shell 中輸入下列指令：

pip install --upgrade google-cloud-vision --user

試試看

讓我們看看 Cloud Vision API 的程式碼範例。我們想瞭解指定圖片的內容。detect.py 似乎很適合用於此用途，因此我們來擷取該項目。其中一種方法是複製 detect.py 的內容，在 Cloud Shell 中建立名為 vision.py 的新檔案，然後將所有程式碼貼到 vision.py。您可以在 Cloud Shell 程式碼編輯器中手動執行這項操作，也可以在 Cloud Shell 中執行下列 curl 指令：

curl https://raw.githubusercontent.com/GoogleCloudPlatform/python-docs-samples/master/vision/cloud-client/detect/detect.py -o vision.py

完成上述步驟後，請在 Cloud Shell 中執行下列指令，即可使用 API：

python vision.py labels-uri gs://cloud-samples-data/ml-api-codelab/birds.jpg

您應該會看到有關鳥類和鴕鳥的輸出內容，因為這是分析的圖片：https://storage.googleapis.com/cloud-samples-data/ml-api-codelab/birds.jpg

為什麼會發生這種情況？

您將 2 個引數傳遞至 vision.py：

labels-uri 會選取要執行的 detect_labels_uri() 函式
gs://cloud-samples-data/ml-api-codelab/birds.jpg 是 Google Cloud Storage 中圖片的位置，並以 uri 形式傳遞至 detect_labels_uri()

接著就來進一步瞭解detect_labels_uri()。請注意插入的其他註解。

def detect_labels_uri(uri):
    """Detects labels in the file located in Google Cloud Storage or on the
    Web."""
    # relevant import from above
    # from google.cloud import vision

    # create ImageAnnotatorClient object
    client = vision.ImageAnnotatorClient()

    # create Image object
    image = vision.types.Image()

    # specify location of image
    image.source.image_uri = uri

    # get label_detection response by passing image to client
    response = client.label_detection(image=image)

    # get label_annotations portion of response
    labels = response.label_annotations
    print('Labels:')

    for label in labels:
        # print the label descriptions
        print(label.description)

4. Cloud Speech-to-Text

Python 用戶端

您需要 Cloud Speech-to-Text 的 Python 用戶端。如要安裝，請在 Cloud Shell 中輸入下列指令：

sudo pip install --upgrade google-cloud-speech

試試看

請前往 Cloud Speech-to-Text 的程式碼範例。我們想轉錄語音音訊。「transcribe.py」似乎是不錯的起點，我們就從這裡開始。複製 transcribe.py 的內容，在 Cloud Shell 中建立名為 speech2text.py 的新檔案，然後將所有程式碼貼到 speech2text.py。您可以在 Cloud Shell 程式碼編輯器中手動執行這項操作，也可以在 Cloud Shell 中執行下列 curl 指令：

curl https://raw.githubusercontent.com/GoogleCloudPlatform/python-docs-samples/master/speech/cloud-client/transcribe.py -o speech2text.py

完成上述步驟後，請在 Cloud Shell 中執行下列指令，即可使用 API：

python speech2text.py gs://cloud-samples-data/ml-api-codelab/tr-ostrich.wav

應該會出現錯誤，指出編碼和取樣赫茲率有誤。別擔心，請前往程式碼中的 transcribe_gcs()，然後從 RecognitionConfig() 刪除 encoding 和 sampe_hertz_rate 設定。同時，請將語言代碼變更為「tr-TR」，因為 tr-ostrich.wav 是土耳其文的語音錄音。

config = types.RecognitionConfig(language_code='tr-TR')

現在，請再次執行 speech2text.py。輸出內容應為土耳其文，因為這是分析的音訊：https://storage.googleapis.com/cloud-samples-data/ml-api-codelab/tr-ostrich.wav

這是怎麼回事？

您已將 gs://cloud-samples-data/ml-api-codelab/tr-ostrich.wav (Google Cloud Storage 中音訊檔案的位置) 傳遞至 speech2text.py，然後以 gcs_uri 傳遞至 transcribe_uri()

讓我們進一步瞭解修改後的 transcribe_uri()。

def transcribe_gcs(gcs_uri):
    """Transcribes the audio file specified by the gcs_uri."""

    from google.cloud import speech
    # enums no longer used
    # from google.cloud.speech import enums
    from google.cloud.speech import types

    # create ImageAnnotatorClient object
    client = speech.SpeechClient()

    # specify location of speech
    audio = types.RecognitionAudio(uri=gcs_uri)

    # set language to Turkish
    # removed encoding and sample_rate_hertz
    config = types.RecognitionConfig(language_code='tr-TR')

    # get response by passing config and audio settings to client
    response = client.recognize(config, audio)

    # Each result is for a consecutive portion of the audio. Iterate through
    # them to get the transcripts for the entire audio file.
    for result in response.results:
        # The first alternative is the most likely one for this portion.
        # get the transcript of the first alternative
        print(u'Transcript: {}'.format(result.alternatives[0].transcript))

5. Cloud Translation

Python 用戶端

您需要 Cloud Translation 的 Python 用戶端。如要安裝，請在 Cloud Shell 中輸入下列指令：

sudo pip install --upgrade google-cloud-translate

試試看

現在來看看 Cloud Translation 的程式碼範例。以本程式碼研究室為例，我們要將文字翻譯成英文。snippets.py看起來就是我們想要的。複製 snippets.py 的內容，在 Cloud Shell 中建立名為 translate.py 的新檔案，然後將所有程式碼貼到 translate.py。您可以在 Cloud Shell 程式碼編輯器中手動執行這項操作，也可以在 Cloud Shell 中執行下列 curl 指令：

curl https://raw.githubusercontent.com/GoogleCloudPlatform/python-docs-samples/master/translate/cloud-client/snippets.py -o translate.py

完成上述步驟後，請在 Cloud Shell 中執行下列指令，即可使用 API：

python translate.py translate-text en '你有沒有帶外套'

翻譯應為「Do you have a jacket?」。

這是怎麼回事？

您將 3 個引數傳遞至 translate.py：

translate-text 會選取要執行的 translate_text() 函式
en 會做為 target 傳遞至 translate_text()，用於指定要翻譯成的語言
「你有沒有帶外套」是要翻譯的字串，並以 text 形式傳遞至 translate_text()

接著就來進一步瞭解translate_text()。請注意新增的註解。

def translate_text(target, text):
    """Translates text into the target language.

    Target must be an ISO 639-1 language code.
    See https://g.co/cloud/translate/v2/translate-reference#supported_languages
    """
    # relevant imports from above
    # from google.cloud import translate
    # import six

    # create Client object
    translate_client = translate.Client()

    # decode text if it's a binary type
    # six is a python 2 and 3 compatibility library
    if isinstance(text, six.binary_type):
        text = text.decode('utf-8')

    # get translation result by passing text and target language to client
    # Text can also be a sequence of strings, in which case this method
    # will return a sequence of results for each text.
    result = translate_client.translate(text, target_language=target)

    # print original text, translated text and detected original language
    print(u'Text: {}'.format(result['input']))
    print(u'Translation: {}'.format(result['translatedText']))
    print(u'Detected source language: {}'.format(
        result['detectedSourceLanguage']))

6. Cloud Natural Language

Python 用戶端

您需要 Cloud Natural Language 的 Python 用戶端。如要安裝，請在 Cloud Shell 中輸入下列指令：

sudo pip install --upgrade google-cloud-language

試試看

最後，我們來看看 Cloud Natural Language API 的程式碼範例。我們想偵測文字中的實體。snippets.py 似乎包含可執行這項操作的程式碼。複製 snippets.py 的內容，在 Cloud Shell 中建立名為 natural_language.py 的新檔案，然後將所有程式碼貼到 natural_language.py。您可以在 Cloud Shell 程式碼編輯器中手動執行這項操作，也可以在 Cloud Shell 中執行下列 curl 指令：

curl https://raw.githubusercontent.com/GoogleCloudPlatform/python-docs-samples/master/language/cloud-client/v1/snippets.py -o natural_language.py

完成上述步驟後，請在 Cloud Shell 中執行下列指令，即可使用 API：

python natural_language.py entities-text 'where did you leave my bike'

API 應將「bike」識別為實體。實體可以是專有名詞 (公眾人物、地標等) 或普通名詞 (餐廳、體育場等)。

這是怎麼回事？

您將 2 個引數傳遞至 natural_language.py：

entities-text 會選取要執行的 entities_text() 函式
「where did you leave my bike」(你把我的腳踏車放在哪裡) 是要分析實體的字串，並以 text 形式傳遞至 entities_text()

接著就來進一步瞭解entities_text()。請注意插入的新註解。

def entities_text(text):
    """Detects entities in the text."""
    # relevant imports from above
    # from google.cloud import language
    # from google.cloud.language import enums
    # from google.cloud.language import types
    # import six

    # create LanguageServiceClient object
    client = language.LanguageServiceClient()

    # decode text if it's a binary type
    # six is a python 2 and 3 compatibility library
    if isinstance(text, six.binary_type):
        text = text.decode('utf-8')

    # Instantiates a plain text document.
    document = types.Document(
        content=text,
        type=enums.Document.Type.PLAIN_TEXT)

    # Detects entities in the document. You can also analyze HTML with:
    #   document.type == enums.Document.Type.HTML
    entities = client.analyze_entities(document).entities

    # entity types from enums.Entity.Type
    entity_type = ('UNKNOWN', 'PERSON', 'LOCATION', 'ORGANIZATION',
                   'EVENT', 'WORK_OF_ART', 'CONSUMER_GOOD', 'OTHER')

    # print information for each entity found
    for entity in entities:
        print('=' * 20)
        print(u'{:<16}: {}'.format('name', entity.name))
        print(u'{:<16}: {}'.format('type', entity_type[entity.type]))
        print(u'{:<16}: {}'.format('metadata', entity.metadata))
        print(u'{:<16}: {}'.format('salience', entity.salience))
        print(u'{:<16}: {}'.format('wikipedia_url',
              entity.metadata.get('wikipedia_url', '-')))

7. 整合這些項目

讓我們回顧一下您要建構的內容。

現在我們來整合所有內容。建立 solution.py 檔案，然後將先前步驟中的 detect_labels_uri()、transcribe_gcs()、translate_text() 和 entities_text() 複製並貼到 solution.py 中。

匯入對帳單

取消註解並將匯入陳述式移至頂端。請注意，系統會匯入 speech.types 和 language.types。這會導致衝突，因此我們只要移除這些項目，並將 transcribe_gcs() 和 entities_text() 中個別出現的 types 分別變更為 speech.types 和 language.types 即可。您應該會看到以下畫面：

from google.cloud import vision
from google.cloud import speech
from google.cloud import translate
from google.cloud import language
from google.cloud.language import enums
import six

傳回結果

請讓函式傳回結果，而非輸出結果。您應該會看到類似下列的內容：

# import statements

def detect_labels_uri(uri):
    # code

    # we only need the label descriptions
    label_descriptions = []
    for label in labels:
        label_descriptions.append(label.description)

    return label_descriptions

def transcribe_gcs(gcs_uri):
    # code

    # naive assumption that audio file is short
    return response.results[0].alternatives[0].transcript

def translate_text(target, text):
    # code

    # only interested in translated text
    return result['translatedText']

def entities_text(text):
    # code

    # we only need the entity names
    entity_names = []
    for entity in entities:
        entity_names.append(entity.name)

    return entity_names

使用函式

完成所有辛苦的工作後，您就可以呼叫這些函式。請繼續！範例如下：

def compare_audio_to_image(audio, image):
    """Checks whether a speech audio is relevant to an image."""

    # speech audio -> text
    transcription = transcribe_gcs(audio)

    # text of any language -> english text
    translation = translate_text('en', transcription)

    # text -> entities
    entities = entities_text(translation)

    # image -> labels
    labels = detect_labels_uri(image)

    # naive check for whether entities intersect with labels
    has_match = False
    for entity in entities:
        if entity in labels:
            # print result for each match
            print('The audio and image both contain: {}'.format(entity))
            has_match = True

    # print if there are no matches
    if not has_match:
        print('The audio and image do not appear to be related.')

支援多種語言

我們先前已將土耳其文硬式編碼至 transcribe_gcs()。接著來變更設定，讓語言可從 compare_audio_to_image() 指定。必須變更的項目如下：

def transcribe_gcs(language, gcs_uri):
    ...
    config = speech.types.RecognitionConfig(language_code=language)

def compare_audio_to_image(language, audio, image):
    transcription = transcribe_gcs(language, audio)

試試看

最終程式碼位於這個 GitHub 存放區的 solution.py 中。以下是擷取該項目的 curl 指令：

curl https://raw.githubusercontent.com/googlecodelabs/integrating-ml-apis/master/solution.py -O

GitHub 上的版本包含 argparse，可從指令列執行下列操作：

python solution.py tr-TR gs://cloud-samples-data/ml-api-codelab/tr-ball.wav gs://cloud-samples-data/ml-api-codelab/football.jpg

針對找到的每個項目，程式碼應輸出「音訊和圖片都包含：」。以上述範例來說，輸出內容應為「音訊和圖片都包含：ball」。

額外福利：試用更多功能

以下提供更多音訊和圖片檔案位置供您嘗試。

土耳其文語音樣本：gs://cloud-samples-data/ml-api-codelab/tr-ball.wav gs://cloud-samples-data/ml-api-codelab/tr-bike.wav gs://cloud-samples-data/ml-api-codelab/tr-jacket.wav gs://cloud-samples-data/ml-api-codelab/tr-ostrich.wav	德文語音樣本：gs://cloud-samples-data/ml-api-codelab/de-ball.wav gs://cloud-samples-data/ml-api-codelab/de-bike.wav gs://cloud-samples-data/ml-api-codelab/de-jacket.wav gs://cloud-samples-data/ml-api-codelab/de-ostrich.wav
圖片樣本：gs://cloud-samples-data/ml-api-codelab/bicycle.jpg gs://cloud-samples-data/ml-api-codelab/birds.jpg gs://cloud-samples-data/ml-api-codelab/coat_rack.jpg gs://cloud-samples-data/ml-api-codelab/football.jpg

8. 恭喜！

您已探索並整合四個機器學習 API，判斷語音樣本是否在描述提供的圖片。這只是起步，這個管道還有許多改善空間！

涵蓋內容

向 Cloud Vision API 提出要求
向 Cloud Speech-to-Text API 提出要求
向 Cloud Translation API 提出要求
向 Cloud Natural Language API 提出要求
同時使用上述所有 API

後續步驟

如要比較字詞，請參閱 word2vec
請參閱 Vision API、Speech-to-Text API、Translation API 和 Natural Language API 的深入程式碼研究室。
請改用 Cloud Video Intelligence 服務。
使用 Cloud Text-to-Speech API 合成語音音訊
瞭解如何將物件上傳至 Cloud Storage

整合機器學習 API 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

1. 總覽

建構項目

課程內容

軟硬體需求

2. 設定和需求

自行設定環境

啟用 API

Cloud Shell

選用：程式碼編輯器

服務帳戶

3. Cloud Vision

Python 用戶端

試試看

為什麼會發生這種情況？

4. Cloud Speech-to-Text

Python 用戶端

試試看

這是怎麼回事？

5. Cloud Translation

Python 用戶端

試試看

這是怎麼回事？

6. Cloud Natural Language

Python 用戶端

試試看

這是怎麼回事？

7. 整合這些項目

匯入對帳單

傳回結果

使用函式

支援多種語言

試試看

額外福利：試用更多功能

8. 恭喜！

涵蓋內容

後續步驟

整合機器學習 API