Optimizing performance is critical for creating a great VR user experience. Keep your app rendering at 60 fps or above to help maintain VR immersiveness and prevent VR-motion discomfort.
This guide shows you how to optimize your app’s performance. Learn about:
- How the CPU and GPU work in the graphics rendering pipeline
- Identifying and fixing common CPU, GPU, and other rendering bottlenecks
The VR rendering pipeline
CPU and GPU workloads
Your app manages state and is responsible for composing the virtual scene. The game engine is responsible for physics simulation and other tasks such as ensuring that final content is sent to the GPU for rendering. Both of these workloads run on the CPU.
Understanding draw calls
At the end of each frame, the game engine renders the final app image by sending a series of draw calls to the GPU. Each draw call is a batch of work that includes:
- Mesh data
- Associated materials and properties
- Shader details specified by the material
The game engine might do some basic object culling so that only objects visible to the camera are submitted to the GPU. However, even objects that are only partially visible are submitted in full to the GPU. The cost of processing this hidden geometry can add to your rendering costs.
For each draw call, the GPU:
- Performs calculations for each vertex in the vertex shader
- Passes the results of these calculations to the fragment shader
- Based on interpolated results from the vertex shader results, performs a calculation to determine the final pixel color for each fragment
Avoiding overdraw in opaque and transparent objects
Opaque objects occlude, or hide, what is behind them, while transparent objects filter the light from the scene behind them.
To avoid drawing pixels twice:
Make appropriate use of GPU depth buffers
Game engines generally try to order draw calls for opaque objects from front to back. Shaders write out the depth of each drawn pixel so that subsequent draw calls for geometry that is farther from the camera but in the same pixel location can be skipped. This is known as Z-culling.Make appropriate use of GPU stencil buffers
Similarly to the depth buffer, apps can use the GPU's stencil buffer to prevent rendering in certain parts of the final image. For example, this can be used to ensure that only the visible portion of a scrollable window is rendered.
Full-screen effects
After rendering, the GPU applies any final full-screen effects or overlays. These include:
- Bloom effects, such as representing the glow of a lamp
- Post-process anti-aliasing, such as FXAA
Audio
In addition to handling app logic, submitting work to the GPU, and performing physics calculations, the CPU is responsible for submitting audio data to the hardware.
Having to manage and play back many audio streams at once can impact your app’s performance. Decompressing compressed audio files, fetching streamed audio, or applying head-related transfer functions to render spatial audio can take up significant CPU resources.
These costs reduce the amount of graphics related work that the CPU is able to perform.
Estimating hardware-based resource constraints
It is a good idea early in development to roughly estimate resource constraints based on the target hardware. This might not always be practical, but testing with placeholder assets throughout development can help set guidelines for what is possible and practical and minimize the cost of optimization or design changes later.
Whenever possible, test performance on a variety of devices to ensure consistent results across all of the devices your app supports. See Thermal issues for additional guidance on testing thermal limits of various devices.
Here are some rough device specific guidelines to get you started:
Pixel and Pixel 2
- 100 total draw calls (50 per eye)
- 600k total vertices (300k per eye)
- No more than 2 texture lookups per shader
- Minimize transparent overdraw
- No post-process effects
- Use forward rendering
- Use only 32 bit color and 24/8 bit depth buffers.
- 2x MSAA or lower
Diagnosing performance issues
The first step in diagnosing performance problems is determining whether your app is CPU- or GPU-bound.
The CPU and GPU must collaborate and carefully coordinate work involved in rendering each frame. While each is able to perform most of its work independently of the other, they must also wait for each other at least once per frame. This means that for most apps, frame rate is either being limited most by the app’s CPU workload or its GPU workload.
Tools for diagnosing issues
Identifying the cause of performance issues can be difficult and might require multiple tools. The most important tools measure the frame-time on device and identify low performance spikes.
The Unity Profiler and the Unreal Profiler are both useful tools for their respective engines.
For real-time monitoring of performance on device, you also can enable Performance HUD.
For monitoring specific graphics calls, you can use GAPID.
For detailed performance analysis, you can use Android Studio's systrace.
Use your app as its own diagnostic tool. Make temporary changes to your app to confirm where performance bottlenecks are so that you can spend time effectively on fixing the issues impacting your app's rendering performance.
Vsync and frame rate issues
In VR, Vsync is always on, which means your frames will always be rendered on a Vsync interval (1/60th of a second on phones). If you are rendering at less than 60 fps, you might see a ‘sawtooth’ pattern, where some frames appear to render at 60 fps, but then some render at 30 fps. This is because some frames missed the Vsync interval and had to wait for the next sync point.
Common performance bottlenecks
Once you determine the origin of your app’s current performance bottleneck, you can start making adjustments to fix the specific problem(s). The following common rendering bottleneck areas are listed in order of likelihood, depending on whether your app is CPU- or GPU-bound.
GPU optimization: fill rate
“Fill rate” generally refers to a few separate problems:
- Frame buffer bandwidth
- Fragment shader processing
- Texture bandwidth
These issues are grouped together as fill rate because all three depend on the number of pixels being rendered.
Diagnosing fill rate problems
In mobile VR, fill rate is the most common cause of performance issues. It is also easy to diagnose.
As a test, temporarily decrease the resolution of your app.
- If the frame rate increases, fill rate is the primary cause of your performance problem. This means that your app’s frame rate is fill-bound.
- If the frame rate does not increase, fill rate is not currently causing a performance bottleneck in your app. In this case, optimize other aspects of your app and then repeat the fill rate test.
Fixing fill rate problems
Adjust post-process effects or additional render targets (shadows)
- Full-frame post process effects are very expensive and should not be used.
- High resolution shadow maps are expensive. Disable shadows, decrease shadow resolution, consider baking shadows, or try re-rendering shadow maps only periodically.
- Use forward rendering. Do not use deferred rendering on mobile hardware.
Improve render target bandwidth
- Reduce the render target resolution by specifying a smaller clip region on an existing target.
- Decrease or disable MSAA.
- Disable HDR. Never use HDR buffers in mobile VR.
- Reduce the amount of transparent overdraw.
Improve texture bandwidth
Always use texture compression and make sure you are using a supported format on the target hardware. ETC2 or ASTC are generally well supported.
Generally, you should always enable mip-maps. Without mip-maps, minified textures become exceptionally more expensive due to poor cache locality of texture lookups.
- Note that decreasing texture resolution on a texture with mip-maps enabled will not help solve bandwidth issues.
- Only adjust mip-map bias judiciously. Setting too large a value is functionally equivalent to disabling mip-maps.
Avoid anisotropic and trilinear filtering unless absolutely necessary.
- Prefer anisotropic and/or trilinear filtering to disabling mip-maps.
Manually sort geometry to maximize the benefits of early-z rejection.
- Game engines will generally handle this for you.
Reduce the number of texture lookups per pixel.
- Try temporarily switching all materials to use an unlit, solid color shader. If performance improves substantially, it is very likely due to excessive texture bandwidth.
Bake data (like ambient occlusion) into vertices rather than using textures.
Combine separate grayscale textures into a single RGB texture to minimize the number of texture lookups.
Fragment Processing
- Try using simpler shaders.
- Try temporarily switching all materials to an unlit version (with the same number of texture lookups). If performance improves substantially, expensive fragment shaders are most likely the problem.
- Move expensive fragment operations to the vertex shader.
- Avoid expensive fragment operations like branching on dynamic values, looping over dynamic values, or expensive math operations.
Use half precision math wherever possible.
- Be sure to test results on mobile hardware. Desktop hardware often does not have support for half precision values, which can hide problems if you preview content only on desktop.
CPU optimization
There are many tasks that might over-utilize the CPU, including draw calls, audio, physics, custom scripts, UI, and garbage collection.
Tips for managing your app’s CPU budget:
- Physics calculations can be costly if your app is using a physics engine.
- Game logic complexity can increase CPU costs.
- Avoid per-frame object creation. Use object pooling instead to help avoid garbage collection while your app is running.
To identify specific CPU issues, use a profiler such as the Unity Profiler or the Unreal Profiler to diagnose specific causes related to scripts, audio processing, or physics.
Audio
If you are spending a large amount of CPU time processing audio, consider the following optimizations:
- Add and test your audio’s performance early in development. Adding audio late in development can often cause performance issues.
- Reduce the sampling rate of audio clips to 22kHz or even 11kHz if possible.
- Reduce the number of clips being played simultaneously. If you intend to have more than 10 clips playing simultaneously, test this as early as possible to account for the cost.
- Utilize hardware decompression for a single source (or as many as are supported in hardware), but be careful not to decompress audio in software, as this can be expensive.
- Monitor the performance impact of streaming sources (by switching streaming on and off).
- Spatial audio processing costs multiply across all of the sound sources in your app. Use only the sound sources you need to create your app’s user experience. If your app includes numerous audio sources, manage rendering costs carefully.
- Avoid using stereo audio samples if your app only requires mono.
- For spatial audio, use sounds without added runtime reverb unless it is necessary.
- If possible, allocate memory for loading audio to avoid the cost of fetching streamed audio files.
- For compressed audio files, evaluate the cost of decoding the particular codec used. Some codecs are more space efficient than others, but might be more expensive to decompress.
- Reduce or remove audio effects.
Draw calls
If you determined (Unity, Unreal) that your app's CPU render time is a bottleneck, it is likely that you are attempting to render too many objects and your app is draw-bound. Each mesh-material pair that is submitted to the GPU incurs some CPU overhead.
There are a number of ways to reduce this cost:
- Use multithreaded rendering. This will spread out the CPU cost across multiple threads.
- Reduce the number of draw calls by combining static objects that use the same
materials (Unity, Unreal),
or otherwise pre-combine mesh objects in a modeling program, and create a
combined texture atlas.
- Be aware that merging objects will likely increase your vertex count, as the offscreen portion of combined objects will still be submitted to the GPU for vertex processing. Therefore if you manually merge objects, it’s preferable to merge objects that are usually rendered together.
- Review performance before and after these changes to ensure that the changes are beneficial and actually improve performance.
GPU optimization: vertex processing
When optimizing, developers often start by optimizing poly count. However, this is often a premature optimization. In fact, if adding geometric detail lets you remove a normal map or additional texture lookups, this might actually improve your frame rate.
In isolated tests, the Google Pixel and Google Pixel XL can render as many as 1 million vertices per eye at 60 fps. In practice, you should target a number much lower than that, as your app will be doing more than just rendering vertices. Furthermore, vertex processing power is hardware dependent, so results will vary from device to device.
To reduce vertex count, consider the following optimizations:
- Create LOD (level of detail) meshes with lower vertex counts to use at greater distances.
- Break up large models into parts so that they are frustum culled. This will increase draw calls, however.
- Similar to the previous tip, disable static batching. This will also increase draw count.
- Reduce model detail in a modeling application.
Garbage collection (Unity)
If you see a slow frame every so often, it’s possible that the garbage collector is to blame.
To maintain a smooth frame rate in VR generally, make sure that the garbage collector does not run during gameplay. You can do this by reducing or eliminating per-frame allocations. For example, avoid garbage collection in each frame by creating all of the objects that you need at the beginning of your game.
Utilize pools of reusable game objects instead of creating and destroying objects during game play.
Optimizing for memory allocations is easier when considered early in development.
The Unity Memory Profiler provides a memory allocation summary. Use the Detailed view to get a snapshot of current allocations.
For more information, see Garbage Collection In Unity.
UI
UI rendering can impact performance because it can involve complex calculations of component layouts and other rendering considerations.
Consider simplifying UI layouts wherever possible.
Dynamic UI requires costly layout recalculations.
Consider adding layout constraints, such as specifying fixed sizes for UI components, to avoid layout recalculations for dynamically changing content.
Transparent UI and opaque UI that is often treated as transparent by the game engine, have a higher rendering cost than ordinary opaque geometry.
UI is often drawn after the scene is rendered, resulting in overdraw.
UI rendering itself often results in multiple levels of overdraw as each layer of the UI is built up over multiple passes.
Use game engine tools (such as Unity's Scene view) to visualize overdraw.
Diagnose and remove UI bottlenecks
Try temporarily turning your UI off globally to see if performance improves. If it does, this indicates that your app has a UI bottleneck.
To address a UI bottleneck:
Switch from transparent UI elements to opaque if you think overdraw might be an issue.
As noted above, game engines might treat opaque UI as transparent, so you might not see an immediate performance benefit.
For UI with a complex layout, render from a static texture that mimics the layering you need.
For performance critical 2D visuals, consider using simple geometry and baked textures.
Thermal issues
If your performance drops suddenly after extended use, it’s possible that the device you are using is reaching its imposed thermal limits.
If you suspect thermal issues, check for the following:
Android device logs indicating that the device is being limited thermally.
The device is noticeably warm to the touch.
Use the Performance HUD information about device temperature.
Tips for managing thermal issues:
On Daydream, you can enable Sustained Performance Mode to throttle the CPU and GPU, reducing thermal output. This feature is intended for long running apps, such as video players, and apps that do not require peak CPU and GPU performance. When enabled, you need to further optimize your app to run at 60fps while running at the lower CPU and GPU clock speeds. For more details, see Unity's or Unreal’s documentation.
Enabling additional hardware features (like the camera or video decoding) increases the thermal load. Using Wi-Fi or a cellular modem to stream video is also very power intensive compared to viewing a video from the local filesystem.
When testing, keep in mind:
If you test your app only for short periods of time, your phone has time to cool off between iterations. You might miss the fact that the device would overheat for a user running a more typical length session.
Testing your app while the phone is outside of the headset will result in a different thermal profile from a phone inside a headset.
Whenever possible, test on a variety of devices. Even if you perform extensive tests on your own device, this does not account for important performance and thermal characteristics of the devices that your users have:
- How much heat they produce
- How much heat they can dissipate
- Manufacturer and device specific configured thermal limits
Without testing on multiple devices, you might also miss these important characteristics of real-world user scenarios with your app:
- Whether the user is using a phone case
- How much insulation the case provides
- The starting temperature of the device when your app launches
- Background app activity on the device
- Whether the device network (Wi-Fi and mobile data) radios are being used
- Whether the device is charging
If your testing does not account for these characteristics, your app might not run for very long on a user’s own device before hitting an overheating condition and/or being thermally throttled so that it no longer runs at peak performance.