This blog was originally posted here.
Let’s first have a closer look at the definition of idempotence, according to Wikipedia.
“Idempotence is the property of certain operations in mathematics and computer science, that can be applied multiple times without changing the result beyond the initial application.” The meaning of this definition is explained as: “a function is idempotent if, whenever it is applied twice to any value, it gives the same result as if it were applied once; i.e., ƒ(ƒ(x)) ≡ ƒ(x)“.
If we apply this on integration, it means that a system is idempotent when it can process a specific message multiple times, while still retaining the same end-result. As a real-life example, an ERP system is idempotent if only one sales order is created, even if the CreateSalesOrder command message was submitted multiple times by the integration layer.
Often, customers request the integration layer to perform duplicate detection, so the receiving systems should not be idempotent. This statement is only partially true. Duplicate detection on the middleware layer can discard messages that are received more than once. However, even in case a message is only received once by the middleware, it may still end-up multiple times in the receiving system. Below you can find two examples of such edge cases.
Web service communication
Nowadays, integration leverages more and more the power of API’s. API’s are built on top of the HTTP protocol, which can cause issues due to its nature. Let’s consider the following situations:
- In this case, all is fine. The service processed the request successfully and the client is aware of this.
- Here there is also no problem. The service failed processing the request and the client knows about it. The client will retry. Eventually the service will only process the message once.
- This is a dangerous situation in which client and service are misaligned on the status. The service successfully processed the message, however the HTTP 200 response never reached the client. The client times out and will retry. In this case the message is processed twice by the server, so idempotence might be needed.
In case a message queueing system is used, idempotency is required if the queue supports guaranteed at-least-once delivery. Let’s take Azure Service Bus queues as an example. Service Bus queues support the PeekLock mode. When you peek a message from the queue, it becomes invisible for other receivers during a specific time window. You can explicitly remove the message from the queue, by executing a Complete command.
In the example below, the client peaks the message from the queue and sends it to the service. Server side processing goes fine and the client receives the confirmation from the service. However, the client is not able to complete the message because of an application crash or a network interference. In this case, the message will become visible again on the queue and will be presented a second time towards the service. As a consequence, idempotence might be required.
The above scenarios showcase that duplicate data entries can be avoided most of the time, however in specific edge cases a message might be processed twice. Within the business context of your project, you need to determine if this is an issue. If 1 out of 1000 emails is sent twice, this is probably not a problem. If, however 1 out of 1000 sales orders are created twice, this can have a huge business impact. The problem can be resolved by implementing exactly-once delivery or by introducing idempotent receivers.
The options to achieve exactly-once delivery on a protocol level are rather limited. Exactly-once delivery is very difficult to achieve between systems of different technologies. Attempts to provide an interoperable exactly-once protocol, such as SOAP WS-ReliableMessaging, ended up very complex and often not interoperable in practice. In case the integration remains within the same technology stack, some alternative protocols can be considered. On a Windows platform, Microsoft Distributed Transaction Coordinator can ensure exactly-once delivery (or maybe better exactly-once processing). The BizTalk Server SQL adapter and the NServiceBus MSMQ and SQL transport are examples that leverage this transactional message processing.
On the application level, the integration layer could be made responsible to check first against the service if the message was already processed. If this turns out to be true, the message can be discarded; otherwise the message must be delivered to the target system. Be aware that this results in chatty integrations, which may influence performance in case of a high message throughput.
Idempotence can be established within the message itself. A classic example to illustrate this is a financial transaction. A non-idempotent message contains a command to increase your bank balance with € 100. If this message gets processed twice, it’s positive for you, but the bank won’t like it. It’s better to create a command message that states that the resulting bank balance must be € 12100. This example clearly solves idempotence, but is not built for concurrent transactions.
An idempotent message is not always an option. In such cases the receiving application must take responsibility to ensure idempotence. This can be done by maintaining a list of message id’s that have been processed already. If a message arrives with an id that is already on the list, it gets discarded. When it’s not possible to have a message id within the message, you can keep a list of processed hash values. Another way of achieving idempotence is to set unique constraints on the Id of the data entity. Instead of pro-actively checking if a message was already processed, you can just give it a try and handle the unique constraint exception.
Lately, I see more and more SaaS providers that publish idempotent upsert services, which I can only encourage! The term upsert means that the service itself can determine whether it needs to perform an insert or an update of the data entity. Preferably, this upsert is performed based on a functional id (e.g. customer number) and not based on an internal GUID, as otherwise you do not benefit from the performance gains.
For each integration you set up, it’s important to think about idempotence. That’s why Codit has it by default on its Integration Analysis Checklist. Investigate the probability of duplicate data entries, based on the used protocols. If your business case needs to avoid this at all time, check whether the integration layer takes responsibility on this matter or if the receiving system provides idempotent service endpoints. The latter is mostly the best performing choice.
Do you have other ways to deal with this? Do not hesitate to share your experience via the comments section!
Thanks for reading!