在 Android 上使用 ML Kit 偵測姿勢

ML Kit 提供兩個針對姿勢偵測所最佳化的 SDK。

SDK 名稱姿勢偵測姿勢偵測與精準度
導入作業在建構期間,程式碼和資產會以靜態方式連結至您的應用程式。在建構期間,程式碼和資產會以靜態方式連結至您的應用程式。
對應用程式大小的影響 (包括程式碼和素材資源)約 10.1 MB約 13.3 MB
效能Pixel 3 XL:約 30 FPSPixel 3XL:使用 CPU 約 23FPS,使用 GPU 可達約 30 FPS

馬上試試

事前準備

  1. 在專案層級的 build.gradle 檔案中,請務必在 buildscriptallprojects 區段中納入 Google 的 Maven 存放區。
  2. 將 ML Kit Android 程式庫的依附元件新增至模組的應用程式層級的 Gradle 檔案,通常為 app/build.gradle

    dependencies {
      // If you want to use the base sdk
      implementation 'com.google.mlkit:pose-detection:18.0.0-beta4'
      // If you want to use the accurate sdk
      implementation 'com.google.mlkit:pose-detection-accurate:18.0.0-beta4'
    }
    

1. 建立「PoseDetector」的執行個體

PoseDetector 種付款方式

如要偵測圖片中的姿勢,請先建立 PoseDetector 的執行個體,並視需要指定偵測工具設定。

偵測模式

PoseDetector 會在兩種偵測模式下運作。請務必選擇符合您用途的選項。

STREAM_MODE (預設)
姿勢偵測器會先偵測圖片中最顯眼的人,然後再執行姿勢偵測。在後續影格中,除非使用者遭到遮蔽或不再以高可信度偵測到,否則系統不會執行人員偵測步驟。姿勢偵測器會嘗試追蹤最有名的人員,並在每次推論時傳回他們的姿勢。這麼做可減少延遲時間和順暢偵測。如要偵測影片串流中的姿勢,請使用這個模式。
SINGLE_IMAGE_MODE
姿勢偵測工具會偵測人,然後執行姿勢偵測。系統會針對每張圖片執行人員偵測步驟,因此延遲時間會比較長,而且不會進行人員追蹤。在靜態圖片上使用姿勢偵測或不需要追蹤的地方時,請使用這個模式。

硬體設定

PoseDetector 支援多種硬體設定,可最佳化效能:

  • CPU:僅使用 CPU 執行偵測工具
  • CPU_GPU:同時使用 CPU 和 GPU 執行偵測工具

建構偵測工具選項時,您可以使用 API setPreferredHardwareConfigs 控管硬體選取項目。根據預設,系統會將所有硬體設定都設為優先設定。

ML Kit 會將每項設定的可用性、穩定性、正確性和延遲時間納入考量,並從您偏好的設定中挑選最合適的設定。如果偏好的設定皆不適用,系統將自動使用 CPU 設定做為備用設定。ML Kit 會在啟用任何加速功能之前,以非阻塞方式執行這些檢查和相關準備作業,因此很可能是使用者第一次執行偵測工具時,會使用 CPU。所有準備作業完成後,最適用的設定會在下列執行作業中使用。

setPreferredHardwareConfigs 的使用範例:

  • 為了讓 ML Kit 挑選最佳設定,請勿呼叫這個 API。
  • 如果不想啟用任何加速功能,請僅傳入 CPU
  • 如果您想使用 GPU 卸載 CPU (即使 GPU 速度可能較慢),請僅傳入 CPU_GPU

指定姿勢偵測器選項:

Kotlin

// Base pose detector with streaming frames, when depending on the pose-detection sdk
val options = PoseDetectorOptions.Builder()
    .setDetectorMode(PoseDetectorOptions.STREAM_MODE)
    .build()

// Accurate pose detector on static images, when depending on the pose-detection-accurate sdk
val options = AccuratePoseDetectorOptions.Builder()
    .setDetectorMode(AccuratePoseDetectorOptions.SINGLE_IMAGE_MODE)
    .build()

Java

// Base pose detector with streaming frames, when depending on the pose-detection sdk
PoseDetectorOptions options =
   new PoseDetectorOptions.Builder()
       .setDetectorMode(PoseDetectorOptions.STREAM_MODE)
       .build();

// Accurate pose detector on static images, when depending on the pose-detection-accurate sdk
AccuratePoseDetectorOptions options =
   new AccuratePoseDetectorOptions.Builder()
       .setDetectorMode(AccuratePoseDetectorOptions.SINGLE_IMAGE_MODE)
       .build();

最後,建立 PoseDetector 的執行個體。傳送您指定的選項:

Kotlin

val poseDetector = PoseDetection.getClient(options)

Java

PoseDetector poseDetector = PoseDetection.getClient(options);

2. 準備輸入圖片

如要偵測圖片中的姿勢,請透過 Bitmapmedia.ImageByteBuffer、位元組陣列或裝置上的檔案建立 InputImage 物件。然後,將 InputImage 物件傳遞至 PoseDetector

如要偵測姿勢,建議使用尺寸至少為 480x360 像素的圖片。如果您正在即時偵測姿勢,以這個最低解析度擷取影格有助於減少延遲。

您可以從不同來源建立 InputImage 物件,詳情請見下文。

使用 media.Image

如要從 media.Image 物件建立 InputImage 物件 (例如使用裝置相機拍照時),請將 media.Image 物件以及圖片的旋轉角度傳遞至 InputImage.fromMediaImage()

如果使用 CameraX 程式庫,OnImageCapturedListenerImageAnalysis.Analyzer 類別會為您計算旋轉值。

Kotlin

private class YourImageAnalyzer : ImageAnalysis.Analyzer {

    override fun analyze(imageProxy: ImageProxy) {
        val mediaImage = imageProxy.image
        if (mediaImage != null) {
            val image = InputImage.fromMediaImage(mediaImage, imageProxy.imageInfo.rotationDegrees)
            // Pass image to an ML Kit Vision API
            // ...
        }
    }
}

Java

private class YourAnalyzer implements ImageAnalysis.Analyzer {

    @Override
    public void analyze(ImageProxy imageProxy) {
        Image mediaImage = imageProxy.getImage();
        if (mediaImage != null) {
          InputImage image =
                InputImage.fromMediaImage(mediaImage, imageProxy.getImageInfo().getRotationDegrees());
          // Pass image to an ML Kit Vision API
          // ...
        }
    }
}

如果您未使用相機程式庫提供圖像的旋轉角度,可以將裝置旋轉角度和裝置相機感應器方向做為計算依據:

Kotlin

private val ORIENTATIONS = SparseIntArray()

init {
    ORIENTATIONS.append(Surface.ROTATION_0, 0)
    ORIENTATIONS.append(Surface.ROTATION_90, 90)
    ORIENTATIONS.append(Surface.ROTATION_180, 180)
    ORIENTATIONS.append(Surface.ROTATION_270, 270)
}

/**
 * Get the angle by which an image must be rotated given the device's current
 * orientation.
 */
@RequiresApi(api = Build.VERSION_CODES.LOLLIPOP)
@Throws(CameraAccessException::class)
private fun getRotationCompensation(cameraId: String, activity: Activity, isFrontFacing: Boolean): Int {
    // Get the device's current rotation relative to its "native" orientation.
    // Then, from the ORIENTATIONS table, look up the angle the image must be
    // rotated to compensate for the device's rotation.
    val deviceRotation = activity.windowManager.defaultDisplay.rotation
    var rotationCompensation = ORIENTATIONS.get(deviceRotation)

    // Get the device's sensor orientation.
    val cameraManager = activity.getSystemService(CAMERA_SERVICE) as CameraManager
    val sensorOrientation = cameraManager
            .getCameraCharacteristics(cameraId)
            .get(CameraCharacteristics.SENSOR_ORIENTATION)!!

    if (isFrontFacing) {
        rotationCompensation = (sensorOrientation + rotationCompensation) % 360
    } else { // back-facing
        rotationCompensation = (sensorOrientation - rotationCompensation + 360) % 360
    }
    return rotationCompensation
}

Java

private static final SparseIntArray ORIENTATIONS = new SparseIntArray();
static {
    ORIENTATIONS.append(Surface.ROTATION_0, 0);
    ORIENTATIONS.append(Surface.ROTATION_90, 90);
    ORIENTATIONS.append(Surface.ROTATION_180, 180);
    ORIENTATIONS.append(Surface.ROTATION_270, 270);
}

/**
 * Get the angle by which an image must be rotated given the device's current
 * orientation.
 */
@RequiresApi(api = Build.VERSION_CODES.LOLLIPOP)
private int getRotationCompensation(String cameraId, Activity activity, boolean isFrontFacing)
        throws CameraAccessException {
    // Get the device's current rotation relative to its "native" orientation.
    // Then, from the ORIENTATIONS table, look up the angle the image must be
    // rotated to compensate for the device's rotation.
    int deviceRotation = activity.getWindowManager().getDefaultDisplay().getRotation();
    int rotationCompensation = ORIENTATIONS.get(deviceRotation);

    // Get the device's sensor orientation.
    CameraManager cameraManager = (CameraManager) activity.getSystemService(CAMERA_SERVICE);
    int sensorOrientation = cameraManager
            .getCameraCharacteristics(cameraId)
            .get(CameraCharacteristics.SENSOR_ORIENTATION);

    if (isFrontFacing) {
        rotationCompensation = (sensorOrientation + rotationCompensation) % 360;
    } else { // back-facing
        rotationCompensation = (sensorOrientation - rotationCompensation + 360) % 360;
    }
    return rotationCompensation;
}

然後,將 media.Image 物件和旋轉角度值傳遞至 InputImage.fromMediaImage()

Kotlin

val image = InputImage.fromMediaImage(mediaImage, rotation)

Java

InputImage image = InputImage.fromMediaImage(mediaImage, rotation);

使用檔案 URI

如要從檔案 URI 建立 InputImage 物件,請將應用程式結構定義和檔案 URI 傳遞至 InputImage.fromFilePath()。使用 ACTION_GET_CONTENT 意圖提示使用者從圖片庫應用程式中選取圖片時,這項功能就很實用。

Kotlin

val image: InputImage
try {
    image = InputImage.fromFilePath(context, uri)
} catch (e: IOException) {
    e.printStackTrace()
}

Java

InputImage image;
try {
    image = InputImage.fromFilePath(context, uri);
} catch (IOException e) {
    e.printStackTrace();
}

使用 ByteBufferByteArray

如要從 ByteBufferByteArray 建立 InputImage 物件,請先按照之前的 media.Image 輸入內容計算圖片旋轉角度。接著,使用緩衝區或陣列建立 InputImage 物件,以及圖片的高度、寬度、顏色編碼格式和旋轉角度:

Kotlin

val image = InputImage.fromByteBuffer(
        byteBuffer,
        /* image width */ 480,
        /* image height */ 360,
        rotationDegrees,
        InputImage.IMAGE_FORMAT_NV21 // or IMAGE_FORMAT_YV12
)
// Or:
val image = InputImage.fromByteArray(
        byteArray,
        /* image width */ 480,
        /* image height */ 360,
        rotationDegrees,
        InputImage.IMAGE_FORMAT_NV21 // or IMAGE_FORMAT_YV12
)

Java

InputImage image = InputImage.fromByteBuffer(byteBuffer,
        /* image width */ 480,
        /* image height */ 360,
        rotationDegrees,
        InputImage.IMAGE_FORMAT_NV21 // or IMAGE_FORMAT_YV12
);
// Or:
InputImage image = InputImage.fromByteArray(
        byteArray,
        /* image width */480,
        /* image height */360,
        rotation,
        InputImage.IMAGE_FORMAT_NV21 // or IMAGE_FORMAT_YV12
);

使用 Bitmap

如要從 Bitmap 物件建立 InputImage 物件,請宣告下列宣告:

Kotlin

val image = InputImage.fromBitmap(bitmap, 0)

Java

InputImage image = InputImage.fromBitmap(bitmap, rotationDegree);

圖像以 Bitmap 物件和旋轉角度表示。

3. 處理圖片

將準備好的 InputImage 物件傳遞至 PoseDetectorprocess 方法。

Kotlin

Task<Pose> result = poseDetector.process(image)
       .addOnSuccessListener { results ->
           // Task completed successfully
           // ...
       }
       .addOnFailureListener { e ->
           // Task failed with an exception
           // ...
       }

Java

Task<Pose> result =
        poseDetector.process(image)
                .addOnSuccessListener(
                        new OnSuccessListener<Pose>() {
                            @Override
                            public void onSuccess(Pose pose) {
                                // Task completed successfully
                                // ...
                            }
                        })
                .addOnFailureListener(
                        new OnFailureListener() {
                            @Override
                            public void onFailure(@NonNull Exception e) {
                                // Task failed with an exception
                                // ...
                            }
                        });

4. 取得偵測到的姿勢相關資訊

如果系統在圖片中偵測到人物,姿勢偵測 API 會傳回包含 33 PoseLandmark 秒的 Pose 物件。

如果人物並未完全出現在圖片中,模型會指派影格外缺少的地標座標,並給予較低的 InFrameConfidence 值。

如果影格中未偵測到任何人,則 Pose 物件不含任何 PoseLandmark

Kotlin

// Get all PoseLandmarks. If no person was detected, the list will be empty
val allPoseLandmarks = pose.getAllPoseLandmarks()

// Or get specific PoseLandmarks individually. These will all be null if no person
// was detected
val leftShoulder = pose.getPoseLandmark(PoseLandmark.LEFT_SHOULDER)
val rightShoulder = pose.getPoseLandmark(PoseLandmark.RIGHT_SHOULDER)
val leftElbow = pose.getPoseLandmark(PoseLandmark.LEFT_ELBOW)
val rightElbow = pose.getPoseLandmark(PoseLandmark.RIGHT_ELBOW)
val leftWrist = pose.getPoseLandmark(PoseLandmark.LEFT_WRIST)
val rightWrist = pose.getPoseLandmark(PoseLandmark.RIGHT_WRIST)
val leftHip = pose.getPoseLandmark(PoseLandmark.LEFT_HIP)
val rightHip = pose.getPoseLandmark(PoseLandmark.RIGHT_HIP)
val leftKnee = pose.getPoseLandmark(PoseLandmark.LEFT_KNEE)
val rightKnee = pose.getPoseLandmark(PoseLandmark.RIGHT_KNEE)
val leftAnkle = pose.getPoseLandmark(PoseLandmark.LEFT_ANKLE)
val rightAnkle = pose.getPoseLandmark(PoseLandmark.RIGHT_ANKLE)
val leftPinky = pose.getPoseLandmark(PoseLandmark.LEFT_PINKY)
val rightPinky = pose.getPoseLandmark(PoseLandmark.RIGHT_PINKY)
val leftIndex = pose.getPoseLandmark(PoseLandmark.LEFT_INDEX)
val rightIndex = pose.getPoseLandmark(PoseLandmark.RIGHT_INDEX)
val leftThumb = pose.getPoseLandmark(PoseLandmark.LEFT_THUMB)
val rightThumb = pose.getPoseLandmark(PoseLandmark.RIGHT_THUMB)
val leftHeel = pose.getPoseLandmark(PoseLandmark.LEFT_HEEL)
val rightHeel = pose.getPoseLandmark(PoseLandmark.RIGHT_HEEL)
val leftFootIndex = pose.getPoseLandmark(PoseLandmark.LEFT_FOOT_INDEX)
val rightFootIndex = pose.getPoseLandmark(PoseLandmark.RIGHT_FOOT_INDEX)
val nose = pose.getPoseLandmark(PoseLandmark.NOSE)
val leftEyeInner = pose.getPoseLandmark(PoseLandmark.LEFT_EYE_INNER)
val leftEye = pose.getPoseLandmark(PoseLandmark.LEFT_EYE)
val leftEyeOuter = pose.getPoseLandmark(PoseLandmark.LEFT_EYE_OUTER)
val rightEyeInner = pose.getPoseLandmark(PoseLandmark.RIGHT_EYE_INNER)
val rightEye = pose.getPoseLandmark(PoseLandmark.RIGHT_EYE)
val rightEyeOuter = pose.getPoseLandmark(PoseLandmark.RIGHT_EYE_OUTER)
val leftEar = pose.getPoseLandmark(PoseLandmark.LEFT_EAR)
val rightEar = pose.getPoseLandmark(PoseLandmark.RIGHT_EAR)
val leftMouth = pose.getPoseLandmark(PoseLandmark.LEFT_MOUTH)
val rightMouth = pose.getPoseLandmark(PoseLandmark.RIGHT_MOUTH)

Java

// Get all PoseLandmarks. If no person was detected, the list will be empty
List<PoseLandmark> allPoseLandmarks = pose.getAllPoseLandmarks();

// Or get specific PoseLandmarks individually. These will all be null if no person
// was detected
PoseLandmark leftShoulder = pose.getPoseLandmark(PoseLandmark.LEFT_SHOULDER);
PoseLandmark rightShoulder = pose.getPoseLandmark(PoseLandmark.RIGHT_SHOULDER);
PoseLandmark leftElbow = pose.getPoseLandmark(PoseLandmark.LEFT_ELBOW);
PoseLandmark rightElbow = pose.getPoseLandmark(PoseLandmark.RIGHT_ELBOW);
PoseLandmark leftWrist = pose.getPoseLandmark(PoseLandmark.LEFT_WRIST);
PoseLandmark rightWrist = pose.getPoseLandmark(PoseLandmark.RIGHT_WRIST);
PoseLandmark leftHip = pose.getPoseLandmark(PoseLandmark.LEFT_HIP);
PoseLandmark rightHip = pose.getPoseLandmark(PoseLandmark.RIGHT_HIP);
PoseLandmark leftKnee = pose.getPoseLandmark(PoseLandmark.LEFT_KNEE);
PoseLandmark rightKnee = pose.getPoseLandmark(PoseLandmark.RIGHT_KNEE);
PoseLandmark leftAnkle = pose.getPoseLandmark(PoseLandmark.LEFT_ANKLE);
PoseLandmark rightAnkle = pose.getPoseLandmark(PoseLandmark.RIGHT_ANKLE);
PoseLandmark leftPinky = pose.getPoseLandmark(PoseLandmark.LEFT_PINKY);
PoseLandmark rightPinky = pose.getPoseLandmark(PoseLandmark.RIGHT_PINKY);
PoseLandmark leftIndex = pose.getPoseLandmark(PoseLandmark.LEFT_INDEX);
PoseLandmark rightIndex = pose.getPoseLandmark(PoseLandmark.RIGHT_INDEX);
PoseLandmark leftThumb = pose.getPoseLandmark(PoseLandmark.LEFT_THUMB);
PoseLandmark rightThumb = pose.getPoseLandmark(PoseLandmark.RIGHT_THUMB);
PoseLandmark leftHeel = pose.getPoseLandmark(PoseLandmark.LEFT_HEEL);
PoseLandmark rightHeel = pose.getPoseLandmark(PoseLandmark.RIGHT_HEEL);
PoseLandmark leftFootIndex = pose.getPoseLandmark(PoseLandmark.LEFT_FOOT_INDEX);
PoseLandmark rightFootIndex = pose.getPoseLandmark(PoseLandmark.RIGHT_FOOT_INDEX);
PoseLandmark nose = pose.getPoseLandmark(PoseLandmark.NOSE);
PoseLandmark leftEyeInner = pose.getPoseLandmark(PoseLandmark.LEFT_EYE_INNER);
PoseLandmark leftEye = pose.getPoseLandmark(PoseLandmark.LEFT_EYE);
PoseLandmark leftEyeOuter = pose.getPoseLandmark(PoseLandmark.LEFT_EYE_OUTER);
PoseLandmark rightEyeInner = pose.getPoseLandmark(PoseLandmark.RIGHT_EYE_INNER);
PoseLandmark rightEye = pose.getPoseLandmark(PoseLandmark.RIGHT_EYE);
PoseLandmark rightEyeOuter = pose.getPoseLandmark(PoseLandmark.RIGHT_EYE_OUTER);
PoseLandmark leftEar = pose.getPoseLandmark(PoseLandmark.LEFT_EAR);
PoseLandmark rightEar = pose.getPoseLandmark(PoseLandmark.RIGHT_EAR);
PoseLandmark leftMouth = pose.getPoseLandmark(PoseLandmark.LEFT_MOUTH);
PoseLandmark rightMouth = pose.getPoseLandmark(PoseLandmark.RIGHT_MOUTH);

提升成效的訣竅

結果的品質取決於輸入圖片的品質:

  • 為了讓 ML Kit 準確偵測姿勢,影像中的人物必須以充足的像素資料表示;為獲得最佳效能,主體至少應為 256 x 256 像素。
  • 如果您在即時應用程式中偵測到姿勢,建議您也考量輸入圖片的整體尺寸。系統可更快處理較小的圖片,因此為了縮短延遲時間,以較低的解析度擷取圖片,但請留意上述解析度要求,並盡可能讓主體佔用大量圖片。
  • 圖像對焦品質不佳也可能會影響準確度。如果您沒有取得可接受的結果,請要求使用者重新拍攝圖片。

如要在即時應用程式中使用姿勢偵測功能,請遵循下列準則,以達到最佳影格速率:

  • 使用基本姿勢偵測 SDK 和 STREAM_MODE
  • 建議以較低的解析度拍攝圖片。也請注意,此 API 的圖片尺寸規定。
  • 如果使用 Cameracamera2 API,請調節偵測工具的呼叫頻率。如果在偵測工具執行時有新的影片影格,請捨棄影格。如需範例,請參閱快速入門導覽課程範例應用程式中的 VisionProcessorBase 類別。
  • 如果使用 CameraX API,請確定背壓策略已設為其預設值 ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST。這麼做可保證系統一次只會傳送一張圖片進行分析。如果分析器忙碌時產生更多圖片,系統會自動捨棄這些圖片,不會排入傳送佇列。呼叫 ImageProxy.close() 關閉要分析的圖片後,就會傳送下一個最新的映像檔。
  • 如果使用偵測工具的輸出內容在輸入圖片上重疊圖像,請先從 ML Kit 取得結果,然後透過一個步驟算繪圖片和疊加層。在每個輸入影格中,這個操作只會轉譯一次到顯示介面一次。如需範例,請參閱快速入門導覽課程範例應用程式中的 CameraSourcePreview GraphicOverlay 類別。
  • 如果你使用 Camera2 API,請擷取 ImageFormat.YUV_420_888 格式的圖片。如果您使用舊版 Camera API,請拍攝 ImageFormat.NV21 格式的圖片。

後續步驟