Home > BizTalk Server > Health and Activity Tracking Inaccurate?

Health and Activity Tracking Inaccurate?

Just the other day we had a new BizTalk application move into production. However, shortly after, problems began where the orchestration that was running encountered an error. I opened up Health and Activity Tracking (HAT), and saw this:

Health and Activity Tracking Image

This error was found in the suspended orchestration and in the Application Log:

System.Data.OracleClient.OracleException: ORA-01017: invalid username/password; logon deniedat System.Data.OracleClient.OracleException.Check(OciErrorHandle errorHandle, Int32 rc)
at System.Data.OracleClient.OracleInternalConnection.OpenOnLocalTransaction(String userName, String password, String serverName, Boolean integratedSecurity, Boolean unicode, Boolean omitOracleConnectionName)
at System.Data.OracleClient.OracleInternalConnection..ctor(OracleConnectionString connectionOptions)
at System.Data.OracleClient.OracleConnectionFactory.CreateConnection(DbConnectionOptions options, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningObject)
at System.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection(DbConnection owningConnection, DbConnectionPool pool, DbConnectionOptions options)
at System.Data.ProviderBase.DbConnectionPool.CreateObject(DbConnection owningObject)
etc.

Because HAT showed a send shape as being the last thing to start execution, I immediately assumed there was a problem with the Oracle Send Port associated with that send shape. A teammate, also trying to solve the problem, assumed the same. We were both frustrated because this problem hadn’t happened in dev or test – what was different now? We tried many things to try and get the production problem solved, but our efforts were in vain. The system ended up being rolled back and the old infrastructure put back in place – a true disaster.

A day later, after some of the pain and sorrow subsided, I contacted Microsoft. I was put in contact with a great support person, who looked at the problem and started asking some questions. I was a little frustrated with his questions at first, but now I know why he was asking them. He kept asking about database connections we were opening in C# code, as opposed to what seemed obvious to me, that this was related to the Oracle Adapter Static One-Way Send Port. He eventually explained his stubbornness for not accepting the problem at face value: 1) the error didn’t indicate anything about the Oracle Adapter and 2) HAT is not always accurate! He mentioned that he has seen many instances where HAT does not actually show the exact node where the problem is occurring; furthermore, he has even seen HAT debug values to be incorrect!

I couldn’t believe what I was hearing! Once he said this, I started to wonder… well, if the Oracle Adapter send port is not the problem (which would make sense since an earlier call to the same database via a solicit-response Oracle send port had worked), what could it be? I began examining the next node after the send, and found the node SiebelUpdate, which was trying to update an Oracle database via C# code. Immediately things started to click – right before deployment, I had been asked to use a new username/password for the Oracle Siebel connection. I had tested all of the previous credentials to avoid this kind of problem (there were 4 data sources being connected to in about 7 or 8 different ways), but I hadn’t tested the new one that had been given to me. I tried opening up the Siebel database using the credentials I had been given, and guess what, same error.

So here’s what I learned:

1. Don’t trust HAT. If we would have known this, I’m pretty sure we’d have figured this out prior to the rollback.
2. Look carefully at the error message – careful inspection showed that it wasn’t related to the Oracle Adapter.
3. Microsoft DOES use the ODBC connection for the Oracle Adapter (we had enabled logging but didn’t see anything – now I know why).
4. Don’t mistrust good old Oracle errors. I had, because there seemed to be no another explanation based on what was shown in HAT.

Advertisements
Categories: BizTalk Server
  1. May 1, 2008 at 12:57 am

    Hi

    I share your frastration, this bugged me for a while….
    but although I agree agree this is clearly a problem with BizTalk 2006 (wasn’t in 2004), “not trusting HAT” is taking it a bit to the extreme in my view.

    I guess they key is to understand the exact scope of the problem and the underlying reasons so that one can know in which cases one can “trust” HAT and in which cases to be more carefule.

    I’ve tried to explain this here: http://www.sabratech.co.uk/blogs/yossidahan/2008/04/and-then-just-when-you-actually-needed.html

    The point is that for all it’s deficiencies HAT is still a very useful tool for the things it does well.

    BTW – the name of the shape that caused the suspension is generlaly available in the event log entry related to the error, and, as of R2, in the database as well if you want to be sure you’re looking at the correct place when troubleshooting.

  2. fehlberg
    May 1, 2008 at 8:01 am

    Great comment Yossi. Your blog entry was very useful in understanding the “why” of the problem I saw. And yes, I was exagerating a bit when I said, “Don’t trust HAT.” HAT is still a very useful tool, but it’s important to understand that it shows the last good persistence point, as your blog suggested.

  3. June 1, 2008 at 4:41 pm

    Hi Again,

    Some thoughts from my experience that might help:

    1. I tend not to use HAT in troubleshooting until I have ruled out a few other things. I find the EventLog/MOM Alerts are always the best first point of call to get information. As Yossi says i think the HAT thing is more a side effect of being able to resume an orchestration. Ive never had this stop me from solving a problem. Yes it would be handy but too often i think people get hung up on HAT as a one stop shop for troubleshooting BizTalk and forget the other sources of information that are available.

    2. As a suggestion when you call out from an expression shape I encourage my teams to always call an external assembly preferably with some custom exception handling in, but even if not you would see from the errors stack trace that it clearly came from your custom code

    Hope these help
    Mike

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: