Register for this year’s #ChromeDevSummit happening on Nov. 11-12 in San Francisco to learn about the latest features and tools coming to the Web. Request an invite on the Chrome Dev Summit 2019 website

A Picture is Worth a Thousand Words, Faces, and Barcodes—The Shape Detection API

What is the Shape Detection API?

With APIs like navigator.mediaDevices.getUserMedia and the new Chrome for Android photo picker, it has become fairly easy to capture images or live video data from device cameras, or to upload local images. So far, this dynamic image data—as well as static images on a page—has been not been accessible by code, even though images may actually contain a lot of interesting features such as faces, barcodes, and text.

For example, in the past, if developers wanted to extract such features on the client to build a QR code reader, they had to rely on external JavaScript libraries. This could be expensive from a performance point of view and increase the overall page weight. On the other hand, operating systems including Android, iOS, and macOS, but also hardware chips found in camera modules, typically already have performant and highly optimized feature detectors such as the Android FaceDetector or the iOS generic feature detector, CIDetector.

The Shape Detection API exposes these native implementations through a set of JavaScript interfaces. Currently, the supported features are face detection through the FaceDetector interface, barcode detection through the BarcodeDetector interface, and text detection (Optical Character Recognition, [OCR]) through the TextDetector interface.

Read explainer

Suggested use cases

As outlined above, the Shape Detection API currently supports the detection of faces, barcodes, and text. The following bullet list contains examples of use cases for all three features.

Face detection

  • Online social networking or photo sharing sites commonly let their users annotate people in images. By highlighting the boundaries of detected faces, this task can be facilitated.
  • Content sites can dynamically crop images based on potentially detected faces rather than relying on other heuristics, or highlight detected faces with Ken Burns-like panning and zooming effects in story-like formats.
  • Multimedia messaging sites can allow their users to overlay funny objects like sunglasses or mustaches on detected face landmarks.

Barcode detection

  • Web applications that read QR codes can unlock interesting use cases like online payments or web navigation, or use barcodes for establishing social connections on messenger applications.
  • Shopping apps can allow their users to scan EAN or UPC barcodes of items in a physical store to compare prices online.
  • Airports can provide web kiosks where passengers can scan their boarding passes’ Aztec codes to show personalized information related to their flights.

Text detection

  • Online social networking sites can improve the accessibility of user-generated image content by adding detected texts as alt attributes for <img> tags when no other descriptions are provided.
  • Content sites can use text detection to avoid placing headings on top of hero images with contained text.
  • Web applications can use text detection to translate texts such as, for example, restaurant menus.

Current status

Step Status
1. Create explainer Complete
2. Create initial draft of specification In Progress
3. Gather feedback & iterate on design In progress
4. Origin trial In progress
5. Launch Not started

How to use the Shape Detection API

The interfaces of all three detectors, FaceDetector, BarcodeDetector, and TextDetector, are similar. They all provide a single asynchronous method called detect() that takes an ImageBitmapSource as an input (that is, either a CanvasImageSource, a Blob, or ImageData).

For FaceDetector and BarcodeDetector, optional parameters can be passed to the detector’s constructor that allow for providing hints to the underlying native detectors.

Working with the FaceDetector

The FaceDetector always returns the bounding boxes of faces it detects in the ImageBitmapSource. Dependent on the platform, more information regarding face landmarks like eyes, nose, or mouth may be available.

const faceDetector = new FaceDetector({
  // (Optional) Hint to try and limit the amount of detected faces
  // on the scene to this maximum number.
  maxDetectedFaces: 5,
  // (Optional) Hint to try and prioritize speed over accuracy
  // by, e.g., operating on a reduced scale or looking for large features.
  fastMode: false
});
try {
  const faces = await faceDetector.detect(image);
  faces.forEach(face => drawMustache(face));
} catch (e) {
  console.error('Face detection failed:', e);
}

Working with the BarcodeDetector

The BarcodeDetector returns the barcode raw values it finds in the ImageBitmapSource and the bounding boxes, as well as other information like the formats of the detected barcodes.

const barcodeDetector = new BarcodeDetector({
  // (Optional) A series of barcode formats to search for.
  // Not all formats may be supported on all platforms
  formats: [
    'aztec',
    'code_128',
    'code_39',
    'code_93',
    'codabar',
    'data_matrix',
    'ean_13',
    'ean_8',
    'itf',
    'pdf417',
    'qr_code',
    'upc_a',
    'upc_e'
  ]
});
try {
  const barcodes = await barcodeDetector.detect(image);
  barcodes.forEach(barcode => searchProductDatabase(barcode));
} catch (e) {
  console.error('Barcode detection failed:', e);
}

Working with the TextDetector

The TextDetector always returns the bounding boxes of the detected texts, and on some platforms the recognized characters.

const textDetector = new TextDetector();
try {
  const texts = await textDetector.detect(image);
  texts.forEach(text => textToSpeech(text));
} catch (e) {
  console.error('Text detection failed:', e);
}

Feature detection

Purely checking for the existence of the constructors to feature detect the Shape Detection API doesn’t suffice, as Chrome on Linux and Chrome OS currently still expose the detectors, but they are known to not work (bug). As a temporary measure, we instead recommend a defensive programming approach by doing feature detection like this:

const supported = await (async () => 'FaceDetector' in window &&
    await new FaceDetector().detect(document.createElement('canvas'))
    .then(_ => true)
    .catch(e => e.name === 'NotSupportedError' ? false : true))();

Best practices

All detectors work asynchronously, that is, they do not block the main thread. So don’t rely on realtime detection, but rather allow for some time for the detector to do its work.

If you are a fan of Web Workers (and who isn’t?), you'll be happy tro know thatß detectors are exposed there as well. Detection results are serializable and can thus be passedß from the worker to the main app via postMessage(). The demo shows this in action.

Not all platform implementations support all features, so be sure to check the support situation carefully and use the API as a progressive enhancement. For example, some platforms might support face detection per se, but not face landmark detection (eyes, nose, mouth, etc.); or the existence and the location of text may be recognized, but not text contents.

Feedback

We need your help to ensure that the Shape Detection API meets your needs and that we’re not missing any key scenarios.

We’re also interested to hear how you plan to use the Shape Detection API:

  • Have an idea for a use case or an idea where you’d use it?
  • Do you plan to use this?
  • Like it, and want to show your support?

Share your thoughts on the Shape Detection API WICG Discourse discussion.

Helpful Links

Was this page helpful?
Yes
What was the best thing about this page?
It helped me complete my goal(s)
Thank you for the feedback. If you have specific ideas on how to improve this page, please create an issue.
It had the information I needed
Thank you for the feedback. If you have specific ideas on how to improve this page, please create an issue.
It had accurate information
Thank you for the feedback. If you have specific ideas on how to improve this page, please create an issue.
It was easy to read
Thank you for the feedback. If you have specific ideas on how to improve this page, please create an issue.
Something else
Thank you for the feedback. If you have specific ideas on how to improve this page, please create an issue.
No
What was the worst thing about this page?
It didn't help me complete my goal(s)
Thank you for the feedback. If you have specific ideas on how to improve this page, please create an issue.
It was missing information I needed
Thank you for the feedback. If you have specific ideas on how to improve this page, please create an issue.
It had inaccurate information
Thank you for the feedback. If you have specific ideas on how to improve this page, please create an issue.
It was hard to read
Thank you for the feedback. If you have specific ideas on how to improve this page, please create an issue.
Something else
Thank you for the feedback. If you have specific ideas on how to improve this page, please create an issue.

rss_feed Subscribe to our RSS or Atom feed and get the latest updates in your favorite feed reader!