Google is committed to advancing racial equity for Black communities. See how.

Depth API overview for Android

The Depth API uses a Depth API-supported device’s RGB camera to create depth maps (also called depth images). You can then use the information provided by a depth map to make virtual objects accurately appear in front of or behind real world objects, enabling immersive and realistic user experiences.

For example, the following images show a virtual Android figure in a real space containing a trunk beside a door. In the first image, the virtual Android unrealistically overlaps with the edge of the trunk. In the second image, the Android is properly occluded, appearing much more realistic in its surroundings.

Depth API-supported devices

Refer to the ARCore supported devices page for a list of devices that support the Depth API.

Depth maps

The Depth API uses a depth-from-motion algorithm to create depth maps, which you can obtain using acquireDepthImage().

This algorithm takes multiple device images from different angles and compares them to estimate the distance to every pixel as a user moves their phone. If the device has an active depth sensor, such as a time-of-flight sensor (or ToF sensor), that data is automatically included in the processed depth. This enhances the existing depth map and enables depth even when the camera is not moving. It also provides better depth on surfaces with few or no features, such as white walls, or in dynamic scenes with moving people or objects.

The following images show a camera image of a hallway with a bicycle on the wall, and a visualization of the depth map that is created from the camera images.

Format of a depth map

Each depth map is represented as an Image.

ARCore uses the DEPTH16 format for depth maps, with the three most-significant bits always set to 000.

Each pixel in the depth map is represented by an unsigned 16-bit integer. The least significant 13 bits contain the distance in millimeters to the estimated surface from the camera's image plane, along the camera's optical axis.

Requirements for motion

The Depth API uses motion tracking, and the active depth sensor if present. The depth-from-motion algorithm effectively treats pairs of camera images as two observations of an assumed static scene. If parts of the environment move — for example, if a person moves in front of the camera — the static components of the scene will have an accurate estimation of depth, but the parts that have moved will not.

For depth to work well on devices without an active depth sensor, the user will need to move their device at least a few centimeters.

Ensuring accurate depth reading

Depth is provided for scenes between 0 and 8 meters. Optimal depth accuracy is achieved between 0.5 meters and 5 meters from the camera. Error increases quadratically as distance from the camera increases.

In absence of an active depth sensor, only RGB color information is used to perform the depth estimation task. Surfaces with few or no features, such as white walls, will be associated with imprecise depth.

Understanding depth values

Given point A on the observed real-world geometry and a 2D point a representing the same point in the depth image, the value given by the Depth API at a is equal to the length of CA projected onto the principal axis. This can also be referred as the z-coordinate of A relative to the camera origin C. When working with the Depth API, it is important to understand that the depth values are not the length of the ray CA itself, but the projection of it.

Next steps

Start using the Depth API in your own apps. To learn more, see: