A Problem with BizTalk Server 2006 Performance – 100% CPU Utilization
A few weeks ago, a new application was deployed at work that was taking about 45-60 seconds to run through a cycle of a particular orchestration that was set up as a singleton. As time went by, the process seemed to take longer – instead of 60 seconds, it seemed to be taking 3 minutes. This bothered me a bit, but I wasn’t sure just what to do. Around this time, I noticed that the host instance process, which used to take only a few CPU cycles on the server, began taking 25% of the 4 CPU cores (100% of one core). Although I know BizTalk uses more than one thread in its processing, I couldn’t help but be suspicious – it sure seemed like a process was out of control.
Nonetheless, the process was finishing, and I had lots of other pressing matters to worry about, so after trying a couple of simple things to speed things up (which all failed), I left the application alone. Yesterday, I noticed that many of the processes that were now running came from messages received over 24 hours prior. This caused an alarm; a meeting was held to figure out what to do. At the end of the day the problem still wasn’t solved – which interpreted means that Victor was going to work on Saturday until he figured things out. I searched on the internet and found many articles on performance. Most of them I had read already, or had been taken into account when setting up the BizTalk Server, but there were a few that were new to me. Each “new” issue I found seemed promising – I crossed my fingers as I implemented each fix, but to no avail. Nothing seemed to help.
So I called Microsoft to get help. I chatted with Sajid, one of their good support persons, on the phone and described the issue. He ran through a handful of possible causes (similar to what I had seen in other articles), to see if he could quickly isolate the problem. No luck. I might add that it was comforting that he too suspected an issue with the SQL Server database – but we found nothing. He didn’t have internet access with him at the time (after all, it was a weekend), so he called back later. He looked at the issue, and asked how many messages were in queue for the process. Unfortunately the admin console doesn’t have a count feature, but after limiting the query to show 5000 results, and having the tool report that there were still more that weren’t being shown, Sajid became suspicious. We terminated the orchestration that was running (the singleton) since I could recreate the input messages, and whala! Problem solved.
So, you may be curious as to what was the problem in the first place. That is, why were there so many messages? Well, as you may know, messages that are consumed by an orchestration still show up in the message holder of the orchestration until the orchestration finishes. A singleton that doesn’t terminate, will eventually cause such a large queue of messages (even if they are marked “consumed”) to appear, that apparently the host instance runs wild while simply trying to pick up the messages that it needs for its own processing. A more sophisticated singleton design would cause the orchestration to complete if no new messages had been received after a given time, say 15 minutes.
I hope as a follow up to this post I’ll have time enough to show some better singleton designs. I know posts without pictures can be boring. =)