The Google Play EMM API has a default limit of 60,000 queries per minute for each EMM.
If you exceed the quota, then the Google Play EMM API returns HTTP 429 Too Many Requests
.
To help ensure that you don’t exceed the stated usage limits and offer an optimal experience for
your users, consider implementing some of the best practices described in the section below.
Recommendations for staying below the API usage limits
When using the Google Play EMM API, there are some best practices that you can implement to distribute requests and reduce your risk of exceeding the usage limits.
Randomize start times and intervals
Activities such as syncing or checking-in devices at the same time are likely to result in a significant increase in request volume. Instead of performing these activities at regularly scheduled intervals, you can distribute your request load by randomizing these intervals. For example, rather than syncing each device every 24 hours, you can sync each device at a randomly chosen time period between 23 and 25 hours. This helps spread out the number of requests.
Similarly, if you run a daily job that makes many API calls in quick succession, consider starting the job at a random time each day to prevent making a high volume of requests for all your enterprise customers at the same time.
Use exponential backoff to retry requests
If you run jobs that consists of many API calls, use an exponential backoff strategy in response to reaching the quota. Exponential backoff is an algorithm that retries requests exponentially. An example flow for implementing simple exponential backoff is as follows:
- Make a request to the Google Play EMM API.
- Receive an
HTTP 429
response. - Wait 2 seconds +
random_time
, then retry the request. - Receive an
HTTP 429
response. - Wait 4 seconds +
random_time
, then retry the request. - Receive an
HTTP 429
response. - Wait 8 seconds +
random_time
, then retry the request.
The random_time
is typically a random number ranging from -0.5 * wait time
to +0.5 * wait time. Redefine a new random_time
each time you retry your
request. API calls that are required to complete user-facing actions can be retried on a more
frequent schedule (0.5s, 1s, and 2s, for example).
Rate-limit batch processes
Each time a batched process reaches the quota, the latency of user actions that call the API increases. In situations like these, strategies such as exponential backoff may not be effective enough in maintaining low latency for user actions.
To avoid reaching the API’s usage limits repeatedly and increasing latency for user-facing actions, consider using a rate limiter for your batched processes (see Google’s RateLimiter). With a rate limiter you can adjust the rate of your API requests so that you consistently remain below the usage limits.
For example, start a batched process with a default rate limit of 50 QPS. As long as the API doesn’t return an error, increase the rate limit slowly (1% every minute). Each time you reach the quota, reduce your request rate by 20%. This adaptive approach results in a more optimal request rate while reducing latency for user-facing actions.