Processing Environments

Earth Engine has different environments for processing data: interactive and batch. These two environments (or "realms") handle different types of queries and have very different performance characteristics, so it's important to understand when and how to use each.

Interactive environment

Also called the "synchronous" or "online" stack, this environment is optimized for answering small requests which finish quickly (responses are limited to tens of megabytes of data and must finish processing within five minutes). Many requests can be made in parallel up to the quota limits.

Endpoints

The interactive environment is composed of different API endpoints: standard and high volume.

Standard endpoint

The standard endpoint is appropriate for most human-driven usage, and it's what powers the Code Editor and Earth Engine Apps. Specifically, this endpoint is best suited for latency-sensitive applications which involve a low volume of concurrent, non-programmatic requests.

High-volume endpoint

The high-volume endpoint is designed to handle a higher volume of requests in parallel than the standard endpoint. Key differences include:

  • Higher latency: The high-volume endpoint has higher average latency per request.
  • Less caching: It provides less caching of intermediate results, so complex queries may require more compute time.
  • Best for automated, small queries: The high-volume endpoint excels at handling many programmatic requests, but is most suitable for simple queries that don't require aggregation (like fetching tiles from prebuilt images).

For complex analyses that need efficient caching, the standard API endpoint may be preferable. The high-volume endpoint is optimized for high-throughput, low-computation tasks. Complex queries typically require more EECU-time when using the high-volume endpoint than they do in the regular online endpoint.

Use of the high-volume endpoint

Python client

When initializing the earthengine library, pass in an opt_url parameter and set it to https://earthengine-highvolume.googleapis.com. As always, be sure to also pass in proper credentials and specify the Cloud project. For example:

ee.Initialize(
    credentials=credentials,
    project='my-project',
    opt_url='https://earthengine-highvolume.googleapis.com'
)

JavaScript client

When initializing the earthengine library using ee.initialize(), pass https://earthengine-highvolume.googleapis.com for the first parameter.

REST API

Direct your REST requests to https://earthengine-highvolume.googleapis.com (instead of https://earthengine.googleapis.com, as shown in the REST API Quickstart, for example).

Batch environment

Also called the "asynchronous" or "offline" stack, this environment is optimized for high-latency parallel processing of large amounts of data. Requests are submitted as tasks to batch processing endpoints, usually by calling data import or export functions (e.g., Export.* and ee.batch.*) from the Earth Engine client libraries. Each batch task has a maximum lifetime of ten days. Each project supports up to 3000 pending tasks, but each individual user is limited to a small number of concurrently running tasks.

Task lifecycle

Tasks are submitted to a queue and ordered by their priority (highest first) and submission time (earliest first). Tasks change from the SUBMITTED (queued) state to the RUNNING state when they're assigned to a batch processor. Each processor is responsible for orchestrating a varying number of batch workers to run the computation and produce the task's results. The number of workers for a task is determined by the EE service's ability to parallelize the job and is not user-configurable.

When using a Cloud project, tasks are visible to anyone with the project-level permission to list tasks. If the project is registered for paid Earth Engine access, tasks are organized in a project-wide queue; if the project is registered for unpaid (research) access, tasks are scheduled independently for each individual but still visible across users of the project.

Tasks complete successfully when they create the necessary artifacts (Earth Engine assets, files in Google Cloud Storage, etc.).

Task management

Tasks can be viewed and cancelled using the following interfaces:

Task failures

If a task fails for a reason which won't be fixed by retrying (e.g., the data are invalid), the task will be marked as FAILED and won't be run again.

If a task fails for a reason which could be intermittent (e.g., it timed out when running a computation), Earth Engine will automatically attempt to retry it and populate the retries field. Tasks can fail up to five times, and the final failure will cause the entire task to be marked as FAILED.

Task ID

Each task has an alphanumeric ID of the form 3DNU363IM57LNU4SDTMB6I33. These can be viewed or obtained through our task management interfaces. If you are starting tasks programmatically, you get the task ID from ee.data.newTaskId. When requesting help to debug an export or ingestion task, provide this task ID as a copyable string (not a screenshot).

List of task states

Tasks can have the following state values:

  • UNSUBMITTED, still pending on the client
  • READY, queued on the server
  • RUNNING, currently running
  • COMPLETED, completed successfully
  • FAILED, completed unsuccessfully
  • CANCEL_REQUESTED, still running but has been requested to be cancelled (i.e., not a guarantee that the task will be cancelled)
  • CANCELLED, cancelled by the owner

Task priority

Task priority is a mechanism for controlling the order of tasks in the queue. Higher priority tasks get scheduled before other pending tasks with lower priorities, regardless of their submission time. The default task priority is 100.

The ability to set other priorities (higher or lower) on export tasks is only available for users of projects that are registered for paid Earth Engine access. Changing the priority of an export task doesn't affect how it's scheduled relative to any import tasks, since the two types of tasks are scheduled separately.

Example: using task priorities

Consider the following task list, where tasks 1-5 are submitted in their natural order with the default priority. They run in the order they were submitted, since the priorities are all the same, and, since two batch processing slots are available for this project, two run concurrently (the first and second submitted).

Task name           State      Priority
---------------------------------------
MyDefaultTask5      READY      100
MyDefaultTask4      READY      100
MyDefaultTask3      READY      100
MyDefaultTask2      RUNNING    100
MyDefaultTask1      RUNNING    100

Submitting a new task, MyHighPriorityTask1, won't affect the running tasks:

Task name           State      Priority
---------------------------------------
MyHighPriorityTask    READY      500
MyDefaultTask5        READY      100
MyDefaultTask4        READY      100
MyDefaultTask3        READY      100
MyDefaultTask2        RUNNING    100
MyDefaultTask1        RUNNING    100

After one of the running tasks completes, the pending task with the highest priority will run (in this case, our high-priority task):

Task name             State      Priority
-----------------------------------------
MyHighPriorityTask    RUNNING    500
MyDefaultTask5        READY      100
MyDefaultTask4        READY      100
MyDefaultTask3        READY      100
MyDefaultTask2        COMPLETED  100
MyDefaultTask1        RUNNING    100