Selfie segmentation with ML Kit on iOS

AI-generated Key Takeaways

ML Kit's Selfie Segmenter API enables you to segment selfies in real-time or single images, offering options for stream or single image modes.
To use the API, you'll need to integrate the GoogleMLKit/SegmentationSelfie pod, create a Segmenter instance, prepare a VisionImage, and process it to obtain a segmentation mask.
You can customize the segmentation process by enabling raw size mask or choosing different segmenter modes based on your use case.
For optimal performance, ensure images are at least 256x256 pixels, consider lower resolutions for real-time applications, and leverage the synchronous API for video frames.
This API is currently in beta and may be subject to changes that break backward compatibility.

ML Kit provides an optimized SDK for selfie segmentation. The Selfie Segmenter assets are statically linked to your app at build time. This will increase your app size by up to 24MB and the API latency can vary from ~7ms to ~12ms depending on the input image size, as measured on iPhone X.

Try it out

Play around with the sample app to see an example usage of this API.

Before you begin

Include the following ML Kit libraries in your Podfile:
```
pod 'GoogleMLKit/SegmentationSelfie', '8.0.0'
```
After you install or update your project’s Pods, open your Xcode project using its .xcworkspace. ML Kit is supported in Xcode version 13.2.1 or higher.

1. Create an instance of Segmenter

To perform segmentation on a selfie image, first create an instance of Segmenter with SelfieSegmenterOptions and optionally specify the segmentation settings.

Segmenter options

Segmenter Mode

The Segmenter operates in two modes. Be sure you choose the one that matches your use case.

STREAM_MODE (default)

This mode is designed for streaming frames from video or camera. In this mode, the segmenter will leverage results from previous frames to return smoother segmentation results.

SINGLE_IMAGE_MODE (default)

This mode is designed for single images that are not related. In this mode, the segmenter will process each image independently, with no smoothing over frames.

Enable raw size mask

Asks the segmenter to return the raw size mask which matches the model output size.

The raw mask size (e.g. 256x256) is usually smaller than the input image size.

Without specifying this option, the segmenter will rescale the raw mask to match the input image size. Consider using this option if you want to apply customized rescaling logic or rescaling is not needed for your use case.

Specify the segmenter options:

Swift

let options = SelfieSegmenterOptions()
options.segmenterMode = .singleImage
options.shouldEnableRawSizeMask = true

Objective-C

MLKSelfieSegmenterOptions *options = [[MLKSelfieSegmenterOptions alloc] init];
options.segmenterMode = MLKSegmenterModeSingleImage;
options.shouldEnableRawSizeMask = YES;

Finally, get an instance of Segmenter. Pass the options you specified:

Swift

let segmenter = Segmenter.segmenter(options: options)

Objective-C

MLKSegmenter *segmenter = [MLKSegmenter segmenterWithOptions:options];

2. Prepare the input image

To segment selfies, do the following for each image or frame of video. If you enabled stream mode, you must create VisionImage objects from CMSampleBuffers.

Create a VisionImage object using a UIImage or a CMSampleBuffer.

If you use a UIImage, follow these steps:

Create a VisionImage object with the UIImage. Make sure to specify the correct .orientation.

Swift

let image = VisionImage(image: UIImage)
visionImage.orientation = image.imageOrientation

Objective-C

MLKVisionImage *visionImage = [[MLKVisionImage alloc] initWithImage:image];
visionImage.orientation = image.imageOrientation;

If you use a CMSampleBuffer, follow these steps:

Specify the orientation of the image data contained in the CMSampleBuffer.

To get the image orientation:

Swift

func imageOrientation(
  deviceOrientation: UIDeviceOrientation,
  cameraPosition: AVCaptureDevice.Position
) -> UIImage.Orientation {
  switch deviceOrientation {
  case .portrait:
    return cameraPosition == .front ? .leftMirrored : .right
  case .landscapeLeft:
    return cameraPosition == .front ? .downMirrored : .up
  case .portraitUpsideDown:
    return cameraPosition == .front ? .rightMirrored : .left
  case .landscapeRight:
    return cameraPosition == .front ? .upMirrored : .down
  case .faceDown, .faceUp, .unknown:
    return .up
  }
}

Objective-C

- (UIImageOrientation)
  imageOrientationFromDeviceOrientation:(UIDeviceOrientation)deviceOrientation
                         cameraPosition:(AVCaptureDevicePosition)cameraPosition {
  switch (deviceOrientation) {
    case UIDeviceOrientationPortrait:
      return cameraPosition == AVCaptureDevicePositionFront ? UIImageOrientationLeftMirrored
                                                            : UIImageOrientationRight;

    case UIDeviceOrientationLandscapeLeft:
      return cameraPosition == AVCaptureDevicePositionFront ? UIImageOrientationDownMirrored
                                                            : UIImageOrientationUp;
    case UIDeviceOrientationPortraitUpsideDown:
      return cameraPosition == AVCaptureDevicePositionFront ? UIImageOrientationRightMirrored
                                                            : UIImageOrientationLeft;
    case UIDeviceOrientationLandscapeRight:
      return cameraPosition == AVCaptureDevicePositionFront ? UIImageOrientationUpMirrored
                                                            : UIImageOrientationDown;
    case UIDeviceOrientationUnknown:
    case UIDeviceOrientationFaceUp:
    case UIDeviceOrientationFaceDown:
      return UIImageOrientationUp;
  }
}

Create a VisionImage object using the CMSampleBuffer object and orientation:

Swift

let image = VisionImage(buffer: sampleBuffer)
image.orientation = imageOrientation(
  deviceOrientation: UIDevice.current.orientation,
  cameraPosition: cameraPosition)

Objective-C

 MLKVisionImage *image = [[MLKVisionImage alloc] initWithBuffer:sampleBuffer];
 image.orientation =
   [self imageOrientationFromDeviceOrientation:UIDevice.currentDevice.orientation
                                cameraPosition:cameraPosition];

3. Process the image

Pass the VisionImage object to one of the Segmenter's image processing methods. You can either use the asynchronous process(image:) method or the synchronous results(in:) method.

To perform segmentation on a selfie image synchronously:

Swift

var mask: [SegmentationMask]
do {
  mask = try segmenter.results(in: image)
} catch let error {
  print("Failed to perform segmentation with error: \(error.localizedDescription).")
  return
}

// Success. Get a segmentation mask here.

Objective-C

NSError *error;
MLKSegmentationMask *mask =
    [segmenter resultsInImage:image error:&error];
if (error != nil) {
  // Error.
  return;
}

// Success. Get a segmentation mask here.

To perform segmentation on a selfie image asynchronously:

Swift

segmenter.process(image) { mask, error in
  guard error == nil else {
    // Error.
    return
  }
  // Success. Get a segmentation mask here.

Objective-C

[segmenter processImage:image
             completion:^(MLKSegmentationMask * _Nullable mask,
                          NSError * _Nullable error) {
               if (error != nil) {
                 // Error.
                 return;
               }
               // Success. Get a segmentation mask here.
             }];

4. Get the segmentation mask

You can get the segmentation result as follows:

Swift

let maskWidth = CVPixelBufferGetWidth(mask.buffer)
let maskHeight = CVPixelBufferGetHeight(mask.buffer)

CVPixelBufferLockBaseAddress(mask.buffer, CVPixelBufferLockFlags.readOnly)
let maskBytesPerRow = CVPixelBufferGetBytesPerRow(mask.buffer)
var maskAddress =
    CVPixelBufferGetBaseAddress(mask.buffer)!.bindMemory(
        to: Float32.self, capacity: maskBytesPerRow * maskHeight)

for _ in 0...(maskHeight - 1) {
  for col in 0...(maskWidth - 1) {
    // Gets the confidence of the pixel in the mask being in the foreground.
    let foregroundConfidence: Float32 = maskAddress[col]
  }
  maskAddress += maskBytesPerRow / MemoryLayout<Float32>.size
}

Objective-C

size_t width = CVPixelBufferGetWidth(mask.buffer);
size_t height = CVPixelBufferGetHeight(mask.buffer);

CVPixelBufferLockBaseAddress(mask.buffer, kCVPixelBufferLock_ReadOnly);
size_t maskBytesPerRow = CVPixelBufferGetBytesPerRow(mask.buffer);
float *maskAddress = (float *)CVPixelBufferGetBaseAddress(mask.buffer);

for (int row = 0; row < height; ++row) {
  for (int col = 0; col < width; ++col) {
    // Gets the confidence of the pixel in the mask being in the foreground.
    float foregroundConfidence = maskAddress[col];
  }
  maskAddress += maskBytesPerRow / sizeof(float);
}

For a full example of how to use the segmentation results, please see the ML Kit quickstart sample.

Tips to improve performance

The quality of your results depends on the quality of the input image:

For ML Kit to get an accurate segmentation result, the image should be at least 256x256 pixels.
If you perform selfie segmentation in a real-time application, you might also want to consider the overall dimensions of the input images. Smaller images can be processed faster, so to reduce latency, capture images at lower resolutions, but keep in mind the above resolution requirements and ensure that the subject occupies as much of the image as possible.
Poor image focus can also impact accuracy. If you don't get acceptable results, ask the user to recapture the image.

If you want to use segmentation in a real-time application, follow these guidelines to achieve the best frame rates:

Use the stream segmenter mode.
Consider capturing images at a lower resolution. However, also keep in mind this API's image dimension requirements.
For processing video frames, use the results(in:) synchronous API of the segmenter. Call this method from the AVCaptureVideoDataOutputSampleBufferDelegate's captureOutput(_, didOutput:from:) function to synchronously get results from the given video frame. Keep AVCaptureVideoDataOutput's alwaysDiscardsLateVideoFrames as true to throttle calls to the segmenter. If a new video frame becomes available while the segmenter is running, it will be dropped.
If you use the output of the segmenter to overlay graphics on the input image, first get the result from ML Kit, then render the image and overlay in a single step. By doing so, you render to the display surface only once for each processed input frame. See the previewOverlayView and CameraViewController classes in the ML Kit quickstart sample for an example.

Selfie segmentation with ML Kit on iOS Stay organized with collections Save and categorize content based on your preferences.

AI-generated Key Takeaways

Try it out

Before you begin

1. Create an instance of Segmenter

Segmenter options

Segmenter Mode

Enable raw size mask

Swift

Objective-C

Swift

Objective-C

2. Prepare the input image

Swift

Objective-C

Swift

Objective-C

Swift

Objective-C

3. Process the image

Swift

Objective-C

Swift

Objective-C

4. Get the segmentation mask

Swift

Objective-C

Tips to improve performance

Selfie segmentation with ML Kit on iOS