Fundamental Concepts

The primary difference between VR and traditional apps is that with VR, the user is immersed within a virtual world. Users can move their head around to change their perspective, and input devices like the Daydream controller allow them to interact with different parts of the virtual world. VR apps need to handle these changes in viewpoint and controller input in a fluid, realistic way in order for the immersion to feel comfortable and real.

Building this sense of immersion is done through thoughtful design and a combination of display, tracking, and audio technologies.

Display

VR displays are the window through which the user sees your world. Displays can be the built-in screen of a Daydream Standalone headset or the screen of your phone when placed within a Daydream View. A lens in front of the screen warps the image shown on the screen in order to fill the user's view.

VR displays need to render images quickly so that as the user moves their head and controllers around the display updates accordingly without noticeable delay. Displays must be low-persistence in order to prevent users from experiencing motion blur while moving their heads. Typically, a display can be either "sample and hold" or "low persistence". Most traditional smartphone and computer displays are "sample and hold", which can lead to motion blur. Advanced displays tuned for VR have a low persistence mode, which reduces motion blur.

Rendering Latency

The time from when the user moves their head or moves an input device, to the time when that change appears on the display is referred to as motion-to-photon latency. In order for most users to be comfortable and for VR to feel immersive, the motion-to-photon latency needs to be under 300ms.

If the latency is too high, users will perceive lag, or in the worst case get motion sickness. If latency is low, users have a more comfortable and immersive experience.

The main challenge in achieving this result is the very short time available to render new frames for the user. Computers are complex, and while individual tasks are done very quickly, they stack up fast and add up to overall latency in the system.

In order to maintain good motion-to-photon latency even with greater than ideal system latency, we use predictive technologies and asynchronous reprojection, to produce an effective motion- to-photon latency that is lower than the total end-to-end latency. See the prediction section for more details.

Display Persistence

Display persistence refers to how long a given frame (image) is displayed on the screen until the next image is rendered to the display. High-persistent displays cause motion blur, because each image frame is displayed to the user for a large proportion of the total frametime, which in a VR environment is uncomfortable and breaks immersion. VR is best experienced using displays in low-persistence mode, which reduces the amount of motion blur. In low-persistence mode, each frame is displayed for a very small period of time, with a black screen for the remaining time before the next frame arrives. Because all of this is happening in milliseconds, it creates the illusion of crisp motion.

A small number of users are sensitive to the flickering effect of low- persistence display. Users can turn off low-persistence mode in VR settings.

Tracking

Tracking is the use of visual and inertial sensors to model the user's position relative to the virtual world. Apps need to precisely know how the user's head and controllers are moving in order to map position and orientation to the virtual world. In order for tracking to provide a comfortable and immersive experience, tracking must be low-latency, highly-accurate, and very consistent.

Tracking systems outputs what's called a pose - a set of data that describes the position and orientation of a tracked object, such as the user's head and controller. Apps can use that pose to properly render the world and use appropriate audio.

Tracking for 3DoF devices is done via the phone's IMU. Tracking for a 6DoF headset, like the Daydream Standalone, uses more powerful WorldSense tracking, which utilizes a combination of computer vision and IMU measurements.

Prediction:

Tracking isn't perfectly accurate and isn't instantaneous. While modern IMUs are fairly accurate, the error that comes from their measurements add up quickly, and can produce noisy results. If we were to use the raw IMU data to generate poses and render the scene, the user would experience a lot of jittering. To smooth things out, these sensor readings must be averaged over a period of time. That combined with system latency causes a delay, which adds to the motion-to-photon latency described in the Display section.

To combat this delay, Daydream adjusts the raw pose data based on an internal prediction twice: once right away, based on the direction the user's head seems to be moving when a frame starts rendering, and then again, just before drawing the final image to the display, using a technology called Asynchronous Reprojection.

Spatial Audio

Spatial audio, which replicates how real sound waves interact with human ears and the environment, is critial to immersion. Spatial audio takes into account the user's position relative to the audio source and the surface material and shape of the environment.

Spatial audio differs from a stereo sound system which only accounts for directionality of and distance to each sound source.

For more information, check out Resonance Audio