Idempotency

Idempotency is the property that an operation may be applied multiple times with the result not differing from the first application. Restated, that means multiple identical requests should have the same effect as a single request.

Why Idempotency?

There are many benefits to idempotency, one of the most straightforward is that it facilitates eventual consistency between systems by making retries safe. This ensures that records between systems are in alignment, and all actions are easily traceable and auditable, therefore minimizing reconciliation issues.

Idempotency prevents race conditions. Idempotency dictates that multiple identical requests from the same client do not result in a different final state. This guarantees that both systems will coalesce into agreement on the state of the resource. If a request with a different idempotency key changes the state of the resource, then it does not break the idempotency principle.

Idempotency minimises state, as data is sent to the server by the client in such a way that requests can be understood in isolation, requiring less contextual state to be stored on the server. This improves performance and throughput by removing server load caused by retention of data.

For these reasons idempotency has become a key technique in internet technologies, it is an integral component of the HTTP specification, REST APIs and designed into the Google Standard Payments specification as a critical aspect of the platform.

Idempotency and Google Standard Payments

In the Google Standard Payments context (GSP), idempotency means that requests that have been previously processed successfully are not reprocessed. The response for the completed processing are reported instead. However, If a previous request resulted in an error like an HTTP 400 or 500, then the GSP system would retry the request with the same requestId, assuming it would be reprocessed.

The way the specification is defined implies that HTTP responses received with the code 200 OK are not retried. However, if a transfer request were rejected due to a 500 error for example, it would retry the same request. This behaviour is helpful if for example the counterpart server was down during the first request, and retrying it gives the request another chance to be processed successfully.

The GSP system automatically retries some requests to ensure the state on Google’s side is the same as the state of the partner system. This must not be processed as another transaction. Therefore, idempotency is very important. This means an integrator should not reprocess something that has already been processed successfully. In such a case, the previous response should be sent instead.

When all instruments are taken into account, there are currently over 15 GSP implementations operating across more than 25 countries in production that all conform to the standard. If the implementation of a core feature like idempotency were to be at odds with expected behaviour, it’s highly likely it would cause ongoing issues and future extensibility would also be constrained. Given the scale of implementations in production, it is not possible to relax these constraints.

Features of GSP idempotency

F1: No side effects on retries.

  • F1.1: After a request is successfully processed, subsequent retries do not lead to side effects. this is key to ensuring the consistency and correctness of the result regardless of communication failures.

F2: The content of a request is identical between initial attempt and retry.

  • F2.1: There is no need to call a different endpoint or change the request.
  • F2.2: Legacy techniques like fields indicating retries are superfluous, particularly when these are commonly used only to return an error if the previous request was not seen before. The first time the system sees a request, it is processed the same regardless of whether it is a retry.
  • F2.3: Systems do not need to reconcile their state with each other between responses.

Exceptions:

  • E1: In the case of temporary failures, subsequent responses may differ to the initial failure as the request is retried.

F3: It just works.

  • F3.1: There are no idempotency outages. There is redundancy that is resilient to both planned maintenance and unexpected data center outages.
  • F3.2: Idempotency is an industry-standard solution to the problem of mitigating communication problems in distributed systems.
  • F3.3: This logic exists in all robust distributed systems, either on the clients or the server. Implementing the logic on the clients simplifies the server, but complicates all the clients. The cleanest solution is to do it on the server.

Examples

The examples below illustrate how idempotency works best.

Example 1: Two requests, connectivity lost

Situation:

  • T0: Google sends a capture request to the integrator.
  • T1: The integrator server receives this request and processes it successfully.
  • T2: Google's server loses power prior to receiving the response in T1.
  • T3: Google's server power is restored and the same capture request is sent with all the same parameters (same request ID and request details but updated requestTimestamp) to the integrator's server.

Outcome:

In this case the integrator server must reply with the same reply given at T1 since all the parameters, except for responseTimestamp, are the same. The user is only debited once, at T0. T3 has no monetary impact to the user.

Example 2: Two requests, first request during maintenance

Situation:

  • T0: Integrator server's database is down for maintenance.
  • T1: Google sends a request to the integrator.
  • T2: Integrator correctly returns UNAVAILABLE status code.
  • T3: Google's server receives the response and schedules a retry.
  • T4: Integrator server's database comes back online.
  • T5: Google resends the request from T1 (same request ID and request details but updated requestTimestamp). Note that the request IDs for both requests should be the same.
  • T6: Integrator server receives request and returns an OK status code along with full response.

Outcome:

In this case the integrator server must process the request in T6 and not return HTTP 503 (UNAVAILABLE). Instead, the integrator server should fully process the request and return OK with appropriate messaging. Note that while the system is UNAVAILABLE Google may make repeated requests similar to T1. Each request should result in a message similar to T2. Eventually, T5 and T6 will occur.

Example 3: Two requests, Google error in request

Situation:

  • T0: Google sends a request to the integrator.
  • T1: The integrator server receives this request and processes it successfully.
  • T2: Google's server loses power prior to receiving the response in T1.
  • T3: Google's server power is restored and the same request is sent (same request ID and request details but updated requestTimestamp). Unfortunately, Google has an error and some of the parameters are different.

Outcome:

In this case the integrator server should reply with an HTTP 412 (PRECONDITION FAILED) error code which denotes to Google that there is an error in this system.