Convoys are frequently used in the in the orchestrations in the BizTalk Server. Sometimes we have got contra intuitive behavior in the convoys, when messages and orchestrations get suspended in unpredictable manner. This issue is well-known; the suspended messages are named as "zombie". The name is unofficial, but issue is still there. Here I would describe in details, when and why this zombie situations are happen. See the short description of zombie in MSDN.


An orchestration can be enlisted with many subscriptions. In other word it can have several Receive shapes. Usually the first Receive creates the Activation subscription but other Receives create the Instance subscriptions. [See “Publish and Subscribe Architecture” in MSDN]

Here is a sample process.


This orchestration has two receives. It is a typical Sequential Convoy. [See "BizTalk Server 2004 Convoy Deep Dive" in MSDN by Stephen W. Thomas].

Let's experiment start.

There are three possible scenarios depends of the message sequences.

First scenario: everything is OK


Activation subscription for the Sample message is created when the orchestration the SampleProcess is enlisted.

The Instance subscription is created only when the SampleProcess orchestration instance starts and this subscription is removed when the orchestration instance ends.

So far so good, the Sample_2 message is delivered exactly in this time interval and consumed.

Second scenario: no consumers

 

Three Sample_2 messages are delivered. The first one is delivered before the SampleProcess starts and before the instance subscription is created. Second message is delivered in the correct time interval. The third one is delivered after the SampleProcess orchestration ended and the instance subscription was removed.

Note: It is not the first Sample_2 consumed. It was first in the queue but it was not waiting, it was suspended when it had been delivered to the Message Box and didn’t have any subscribers at this moment.

The first and the last Sample_2 messages are Suspended (Nonresumable) in the Message Box. For each of them two (!) service instances have created. One service instance has the ServiceClass of Messaging, and its Error Description is:

 

The second service instance has the ServiceClass of RoutingFailureReport, and its Error Description is:

 

Third scenario: something goes wrong

 

Two Sample_2 messages are delivered. Both are delivered in the same interval, while the SampleProcess orchestration is working and the instance subscription has created.

First Sample_2 is consumed. The second Sample_2 has the subscription, but the subscriber, the SampleProcess orchestration, will not consume it. After the SampleProcess orchestration is ended (And only after! I will discuss this in the next article.), it is suspended (Nonresumable). Only one service instance is suspended now. This service instance has the ServiceClass of Orchestration, and its Error Description is:

 

In the Message tab the Sample_2 message is in the Suspended (Resumable) status.

Notes:

  • The orchestration consumes the extra message(s) and gets suspended together with these extra messages. These messages are not consumed in term of “processed by orchestration”. But they are consumed in term of the “delivered to the subscriber”. The receive shape in the orchestration does not receive these extra messages. But these messages are routed to the orchestration. The Error information looks ambiguous.
  • The time zone between the last receive shape and the end of the orchestration is a "dangerous zone". The message delivery pattern should be scrutinized to avoid it.

Unified Sequential convoy

Now get one more scenario.

 

It is a unified sequential convoy. The activation subscription is for the same message type as it for the instance subscription. The Sample_2 message now is the Sample message. For simplicity the SampleProcess orchestration consumes only two Sample messages. Usually the orchestration consumes a lot of messages inside a loop in this scenario, but now there are only two of them.

First message starts the orchestration; the second message goes inside this orchestration. Then the next pair of messages follows, and so on.

But if the input messages follow in shorter intervals we have got the problem.

 

We lose messages in unpredictable manner.


Conclusion:

  • Maybe the better behavior for BizTalk would be if the orchestration removes the instance subscription after the message is consumed, not in the end on the orchestration. Current behavior looks like a bug. But right now it is a “feature” of the BizTalk subscription mechanism.
  • The time zone between the last receive shape and the end of the orchestration is a "dangerous zone". The message delivery pattern should be scrutinized to decrease this zone as much as possible.

Note:

  • Several times I saw the explanation of the zombies, where a zombie created in the time zone between the moment an orchestration is scheduled to dispose and the moment the orchestration is disposed. I.e. the average dangerous time zone is about half of the MessageBox polling interval (which is by default 1 sec), that means 0.5 sec. It is not correct. The dangerous zone is between the last receive (for the message with the instance subscription) and end of orchestration, and this zone could be much bigger than a half of second.

See more about zombies in BizTalk from the BizTalk Core Engine Blog