BizTalk R2 WCF Authorization

Let me share a quick story.  We’ve been using BizTalk Server 2006 for a couple of years now at my company.  We recently upgraded to R2, and I was excited to get the opportunity to work with WCF… so a few months ago I created my first WCF service.  Things worked about like what I expected except for one thing: authorization.  I kept thinking, “this has got to be really easy” yet I could never figure out how to restrict access to certain users (this was fairly easy to do with ASMX services).  That’s what’s leading me to write this post.

There are a few things I like a lot more about WCF, but there’s one thing I can’t stand: the way authorization is implemented.  Here’s why:

From what I’ve seen and heard, WCF is intended to make security easier by allowing this to be configured in XML configuration files.  This is neat because the developer can do his/her work easily, and the details on security can be configured dynamically (without recompiling).   It’s indeed pretty cool that you can change from using Windows Authentication to basic auth, etc. all in a configuration file.  However, this “guiding principle” behind WCF doesn’t hold true when it comes to authorization!  I couldn’t believe my eyes when I read this post:

http://social.msdn.microsoft.com/Forums/en-US/biztalkr2adapters/thread/12a47533-acb4-4ff4-bc32-d8ea305cb066

Are you serious?  I have to write a WHOLE BUNCH of code just to restrict access to a web service?!  There’s not a wizard for this?  Or perhaps some GUI control?  No XML file for this?  Or how about clicking on “Permissions” in IIS like you used to be able to do with ASMX services?  So much for configuring security in an XML file.  I’m deeply disappointed.

In fact, I really hope I’m wrong.  Perhaps there’s some easier way that I just happened to have missed.  Please do tell me this is the case and end this bad dream.

Since I refuse to manage web service access via code for what might end up to become hundreds (or even thousands) of web services, I’ll have to do this some other way.  I think I’ll use SOA’s Service Manager to control authorization (I should get paid for promoting them).  There it can be done easily at the operation level of a web service.  I just thought WCF would have done something like this too.

Leave a Comment

BizTalk Server 2006 R2 Support for SQL Server 2005 SP3

With the release of SQL Server 2005 SP3, many of us are wondering if BizTalk Server 2006 is supported with SP3.  I asked Microsoft this question and here was the reply:

“The BizTalk test team has planned complete testing of this. However, the Rangers team has tested this setup in a in-house test setup and didn’t encounter any issues. SQL SP3 is fully backwards compatible with SQL 2005 SP2 and hence BizTalk databases are covered (in an indirect way). We will have a fully supportability statement on this once the official tests are complete.

“We are recommending that our customers upgrade to SQL 2005 SP3 since it resolves many of the bugs with SP2 and many have upgraded their environments to SP3 successfully. We are fully committed to encourage our customers to be using the latest service pack/security patches.”

We proceeded to use SP3 for a new environment we are setting up; we haven’t encountered any issues yet but we aren’t using it in production yet (we will be in another couple of weeks).

I figured I’d share this since I’m sure it will help someone else out there.  Good luck!

Leave a Comment

Authentication and Authorization of Incoming SOAP Requests to BizTalk

I’ve been meaning to write a blog on this subject for quite some time…  this blog will explain authentication and authorization of incoming SOAP messages to BizTalk.  Although it might seem like a simple subject to some, I’ve seen enough web apps deployed w/o authorization that I figure it’s worth writing about…

Here’s the short answer to how BizTalk authenticates and authorizes incoming SOAP messages: IIS. When I was first exposed to BizTalk I was disappointed to find that the BizTalk books I owned didn’t have a section in their index on authorization (perhaps this is the reason so many apps don’t use it).  I needed to be explicit as to who could access a particular web service.  I didn’t find anything in the BizTalk Admin Console under any of the receive locations/ports, and as a newbie to IIS, I wasn’t sure what exactly was available there.  I saw in IIS an easy way to provide authentication, but authorization wasn’t as clear to me.  So hopefully this will help the next IIS-newbie.

Let’s first talk about authentication.  Here is the authentication methods screen of the IIS Manager found by right-clicking on the deployed web app and choosing Properties:

IIS Authentication

IIS Authentication

As you can see, you have the option to allow anonymous access, Integrated WIndows Authentication (will only work SOAP callers using Windows), Digest authentication (also only for Windows), Basic authentication (very common when working w/non-Windows systems; simple – only use w/HTTPS since the username/password is sent in the clear), and .NET Passport authentication (also for Windows).  I won’t explain each of these here because I’m sure it’s explained well on MSDN.

Now, for that part that I couldn’t find in the indices of BizTalk books… authorization.  Authorization for incoming SOAP messages into BizTalk is also implemented via IIS.  I was pleased to see it’s pretty simple and thorough, once I actually found it.  The trick is not to look at the properties of the web app, but rather right-click and choose permissions.

Authorization in IIS

Authorization in IIS

IIS Authorization/Permissions

IIS Authorization/Permissions

Now, here’s where permissions (authorization) are set.  Grayed out boxes indicate that the permissions are being inherited by a parent.  In the case of BizTalk, the default setting for web apps is to be deployed under the Default Web Site, listening on port 80.   If you stick with this default, you’ll want to be sure to use the minimal set of permissions at the parent, and be more specific for each web app underneath.  For example, if A and B are children of Default Web Site, you may likely want to have one set of permissions for A and separate permissions for B, meaning that you’ll want to limit the common permissions set by default at the parent.  And, if you didn’t guess already, you can’t use authorization (permissions) unless you authenticate a user using one of the methods described earlier (hence authorization w/anonymous access makes no sense).

Good luck!

Leave a Comment

DTA Orphaned Instances

BizTalk Server 2006 (including R2 apparently – see warning section of this page) seems to have a bug, for which I’ve seen no fix, that affects the performance and size of the DTA (BizTalkDTADb) database because it fills it up with orphaned instances.  You can detect these using the following query:

select count(*) from [dbo].[dta_ServiceInstances] where dtEndTime is NULL and [uidServiceInstanceId] NOT IN ( SELECT [uidInstanceID] FROM [MSGBOXSERVER].[BIZTALKMSGBOXDB].[dbo].[Instances]
UNION
SELECT [StreamID] FROM [MSGBOXSERVER].[BIZTALKMSGBOXDB].[dbo].[TrackingData]
)

These can also be detected by the MsgBoxViewer, a great tool that I’d recommend for all BizTalk administrators.

As you may be able to see from the query above, an orphaned instance is one that never finishes.  This can happen for a few, very common, reasons.  For example, an orchestration might throw an exception, or might be terminated by an administrator.  It seems silly to me that these stay in your DTA database forever, but nonetheless, they do.

So how do you fix this?  You can run this update command:

UPDATE [dbo].[dta_ServiceInstances] SET [dtEndTime] = GetUTCDate() where dtEndTime is NULL and [uidServiceInstanceId] NOT IN ( SELECT [uidInstanceID] FROM [MSGBOXSERVER].[BIZTALKMSGBOXDB].[dbo].[Instances]
UNION
SELECT [StreamID] FROM [MSGBOXSERVER].[BIZTALKMSGBOXDB].[dbo].[TrackingData]
)

Here I had set [dtEndTime] = GetUTCDate() but you might want to change this after taking into consideration your “soft delete” date specified in your DTA purge job.  If you have a soft delete date of 14 days, for example,  you might want to set this to currentutcdate()-14 so that the next time the DTA purge and archive runs it will clear out these instances.

Comments (4)

Using MOM to Restart a BizTalk Server 2006 Host Instance Based on a Problem

A project that was deployed at work has a terrible tendency to freeze a host instance. Once the problem occurs, the host instance sits in a zombie state until it is restarted. The causes, which can be credited to 2 or 3 different bugs (out of our direct control), are being investigated (some are described in my earlier posts), but we have at least 1 or 2 out there that haven’t yet been solved (and may not be). So, we’d like the known errors to trigger a Microsoft Operations Manager (MOM) alert, which would have one of its responses be the execution of a script.

This sounds simple, right? Well, not really. I started with the RestartBizTalkHostInstances.vbs script installed with BizTalk Server (e.g. C:\Program Files\Microsoft BizTalk Server 2006\SDK\Samples\ApplicationDeployment\VisualStudioHostRestart). Running the script didn’t work. So I then did some searching and found this helpful article. It taught me a thing or two, and I modified the script to avoid certain statements, like WScript.Echo. Great.

But the script still didn’t work. This is where things get interesting. It turns out that MOM runs on the BizTalk Server (or any server for that matter) as the “Local System”. So, to test a script, or find out what’s really wrong with it (other than some useless error in your event log), you need more information. You can start out by running a command like this, ‘at “09:37″ /interactive cmd.exe’. This command, as written, will open up a command prompt at 9:37am that can be used to run other commands. In this example, 9:37 happened to be 1 minute ahead of the then-current time. After the new command line opens, you can now run scripts simulating the credentials MOM uses (this simulates the “local system” credentials, so potentially knowing this might help you in other problems you may have in the future). In my case, I received another error, which wasn’t particularly helpful.

So, what I did next was run ‘wbemtest’, which opens up a WMI testing utility. This utility allows you to try executing WMI queries and the like, and instead of doing so via scripting, you have a GUI that aids in the process.

Wbemtest Tool

By pressing ‘Connect’ I can enter the namespace to connect to. In my case the WMI script I’m trying to run shows:

Set objWMIService = GetObject(”winmgmts://./root/MicrosoftBizTalkServer”)

In the GUI you enter something slightly different…

Specify WMI Namespace

After connecting, you can then try to simulate your WMI script. Mine needs to run a query, so I press the Query button and copy from the WMI script:

When pressing “Apply” I got this error:

WMI Script Error

If I press “More Information” and scroll down a bit, I find a property called “Description.” Double clicking it shows this error:

BizTalk Server cannot access SQL server.  This could be due to one of the following reasons:
1. Access permissions have been denied to the current user.  Either log on as a user that has been granted permissions to SQL and try again, or grant the current user permission to access SQL Server.
2. The SQL Server does not exist or an invalid database name has been specified.  Check the name entered for the SQL Server and database to make sure they are correct as provided during SQL Server installation.
3. The SQL Server exists, but is not currently running.  Use the Windows Service Control Manager or SQL Enterprise Manager to start SQL Server, and try again.
4. A SQL database file with the same name as the specified database already exists in the Microsoft SQL Server data folder.

Internal error from OLEDB provider: "Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'."

The interesting part of course is the last section, where it becomes apparent that anonymous credentials are trying to be used to run the script (and of course this is not allowed). So what now?

If you go to the MOM folder in a command window, e.g. C:\Program Files\Microsoft Operations Manager 2005, and type ’setactionaccount <MOM Mgmt Group> -query’, you will see the credentials used for the action account (if nothing shows up, you are running as local system). You can change this. You can try setting the credentials using the ’set’ option of the same setactionaccount command. This may or may not work depending on the MOM setup. It didn’t work for me.

The other way to set this, is to use the MOM Administrator Console. There you can find the agent computer, in this case the BizTalk Server, and specify the action account to be used for running scripts (and the like). After doing this, you should be able to repeat the same comand, ’setactionaccount <MOM Mgmt Group> -query’ and you should see the new credentials that were set. You may have to wait a minute for the change to take effect.

By the way, if you don’t know the <MOM Mgmt Group> you can find this by looking at the Console Settings of the Administrator or Operator console of MOM.

Once I did all of this, things starting working like a charm! I hope this helps someone out there – let me know.

Comments (1)

BizTalk Singleton Orchestration Design

Here’s a white paper from Microsoft on creating sequential FIFO orchestrations (this applies to singletons as well), as a follow up to the blog I wrote a few days ago on poorly written singletons. If you read the paper carefully, and pay attention to the “warning” sections, you’ll see that it’s actually very hard to create an orchestration that neatly ends with zero risk of losing messages. This is because a message might come in after the listen shape, but before the orchestration has terminated.

To test this, I created a simple singleton orchestration that adds a deliberate wait of 2 minutes before finishing.

I then deployed the orchestration, and starting sending in messages one by one (by the way, “Do Something” simply writes the one field in the message to the debugger).

So the message with value 1 was sent first, here’s the debugger output:

12880] Field was: 1

Then the message with value 2 was sent in, here’s the output:
[12880] Field was: 2

Then, i waited about 45 seconds, just long enough to get us past the first Delay shape in the orchestration (of 30 seconds).  I submitted a message with value 3, but received no output (as expected).  I waited until the remaining time had finished and saw the orchestration suspend with this error:

The orchestration was not resumable.  I then sent in a message with value 4, here’s the output:
[12880] Field was: 4

So, what does this prove?  If a message comes in before your orchestration has had the time to complete (and you’re no longer waiting to receive a message), you will have unprocessed messages in the orchestration when goes to complete.  I guess the good news is that you can see what that message was by clicking on the message tab, so if you’re willing to run this risk, you might go ahead with this decent, but not perfect, design.

To be absolutely fail-proof, the paper offers a few suggestions, such as stopping the receive location via wmi script as part of the shutdown process.  This is fine and all, except for this one question – how is it supposed to be turned back on? If you do this as part of the same orchestration, you have the same problem you started with!  I guess the one way this could be done would be by adding a “Start Orchestration” shape, which begins with a delay, and then enables the receive location again (the delay is to allow time for the calling orchestration to finish with no risk of losing messages).

Good luck!

Comments (5)

Help w/BizTalk Server 2006/SQL Server Errors

I’d like to see if someone out here has seen any of these errors before… I couldn’t find anything on the internet so far, but I imagine they have a common cause. Ideas?

Severity: Critical Error
Status: New
Source: BizTalk Server 2006
Name: Critical Error: A stored procedure call failed.
Description: The following stored procedure call failed: ” { call [dbo].[bts_GetServiceStaticTrackingInfo]( ?)}”. SQL Server returned error string: “”.
Severity: Critical Error
Status: New
Source: BizTalk Server 2006
Name: Critical Error: A stored procedure call failed.
Description: The following stored procedure call failed: ” { call [dbo].[bts_InsertDynamicStateInfo_BizTalkServerApplication]( ?, ?, ?, ?, ?, ?, ?, ?)}”. SQL Server returned error string: “”.
Severity: Critical Error
Status: New
Source: BizTalk Server 2006
Name: Critical Error: A stored procedure call failed.
Description: The following stored procedure call failed: ” { call [dbo].[bts_GetDynamicStateInfo_BizTalkServerApplication]( ?, ?, ?)}”. SQL Server returned error string: “”.
Severity: Critical Error
Status: New
Source: BizTalk Server 2006
Name: Critical Error: A stored procedure call failed.
Description: The following stored procedure call failed: ” { call [dbo].[bts_CreateSubscription_ACCISHost]( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)}”. SQL Server returned error string: “Cannot create new connection because in manual or distributed transaction mode.”.
Severity: Critical Error
Status: New
Source: BizTalk Server 2006
Name: Critical Error: A stored procedure call failed.
Description: The following stored procedure call failed: ” { call admsvr_SendPortToPEP( ?)}”. SQL Server returned error string: “”.

Comments (4)

A Problem with BizTalk Server 2006 Performance – 100% CPU Utilization

A few weeks ago, a new application was deployed at work that was taking about 45-60 seconds to run through a cycle of a particular orchestration that was set up as a singleton. As time went by, the process seemed to take longer – instead of 60 seconds, it seemed to be taking 3 minutes. This bothered me a bit, but I wasn’t sure just what to do. Around this time, I noticed that the host instance process, which used to take only a few CPU cycles on the server, began taking 25% of the 4 CPU cores (100% of one core). Although I know BizTalk uses more than one thread in its processing, I couldn’t help but be suspicious – it sure seemed like a process was out of control.

Nonetheless, the process was finishing, and I had lots of other pressing matters to worry about, so after trying a couple of simple things to speed things up (which all failed), I left the application alone. Yesterday, I noticed that many of the processes that were now running came from messages received over 24 hours prior. This caused an alarm; a meeting was held to figure out what to do. At the end of the day the problem still wasn’t solved – which interpreted means that Victor was going to work on Saturday until he figured things out. I searched on the internet and found many articles on performance. Most of them I had read already, or had been taken into account when setting up the BizTalk Server, but there were a few that were new to me. Each “new” issue I found seemed promising – I crossed my fingers as I implemented each fix, but to no avail. Nothing seemed to help.

So I called Microsoft to get help. I chatted with Sajid, one of their good support persons, on the phone and described the issue. He ran through a handful of possible causes (similar to what I had seen in other articles), to see if he could quickly isolate the problem. No luck. I might add that it was comforting that he too suspected an issue with the SQL Server database – but we found nothing. He didn’t have internet access with him at the time (after all, it was a weekend), so he called back later. He looked at the issue, and asked how many messages were in queue for the process. Unfortunately the admin console doesn’t have a count feature, but after limiting the query to show 5000 results, and having the tool report that there were still more that weren’t being shown, Sajid became suspicious. We terminated the orchestration that was running (the singleton) since I could recreate the input messages, and whala! Problem solved.

So, you may be curious as to what was the problem in the first place. That is, why were there so many messages? Well, as you may know, messages that are consumed by an orchestration still show up in the message holder of the orchestration until the orchestration finishes. A singleton that doesn’t terminate, will eventually cause such a large queue of messages (even if they are marked “consumed”) to appear, that apparently the host instance runs wild while simply trying to pick up the messages that it needs for its own processing. A more sophisticated singleton design would cause the orchestration to complete if no new messages had been received after a given time, say 15 minutes.

I hope as a follow up to this post I’ll have time enough to show some better singleton designs.  I know posts without pictures can be boring.  =)

Comments (9)

BizTalk Siebel Adapter Error Resolution

At work we’ve been seeing the following error with the BizTalk Siebel Adapter (BizTalk Server 2006) quite frequently:

No connection could be made because the target machine actively refused it

The first time it happened, I immediately presumed that the error was accurate, and notified the Siebel system administrator that something seemed to be wrong with the Siebel server. After checking things out he told me I was mistaken, and that everything was fine.

We have a host instance dedicated to the Siebel adapter, so I simply restarted it and found that things returned to normal again… at least for a few more hours. The error would then reoccur, and so forth. This particular error worried me quite a bit because it meant that a crucial business process was halted as a result. In consequence, a great deal of work was required by the business users to manually redo the process that had failed. Ouch.

Over a course of two weeks I tried all sorts of things. I couldn’t find any articles on the internet that described my problem, and I ended up setting up a scheduled task to restart the SiebelHost every 30 minutes. This worked for the most part, but of course was very sloppy fix. While this was happening I put in a ticket to Microsoft for help.

After 2-3 of days I got a call from Microsoft’s support team (Sajid is really helpful by the way), with an idea to try and help. Here’s the substance of the email:

To fix this issue, create a DWord registry value in the registry for the key HKLM\software\Microsoft\BizTalkAdapters\New Reg Value : StartAgentSleep
Type: DWord
Value : 1000 (Decimal) measured in Milli secondsThe value is configurable and can change according to machines and to different users. 1000 is the value which Microsoft had tried with another customer and seemed to work. 1000 Ms of time = 1 sec.

Setting this causes the adapter to wait longer than the default timeout for the browsingagent.exe or runtimeagent.exe process to prepare itself to talk to the adapter.

So I gave it a whiz and guess what?  Problems solved. We’ve now been up for about five consecutive days without seeing the problem reoccur.

Leave a Comment

Health and Activity Tracking Inaccurate?

Just the other day we had a new BizTalk application move into production. However, shortly after, problems began where the orchestration that was running encountered an error. I opened up Health and Activity Tracking (HAT), and saw this:

Health and Activity Tracking Image

This error was found in the suspended orchestration and in the Application Log:

System.Data.OracleClient.OracleException: ORA-01017: invalid username/password; logon deniedat System.Data.OracleClient.OracleException.Check(OciErrorHandle errorHandle, Int32 rc)
at System.Data.OracleClient.OracleInternalConnection.OpenOnLocalTransaction(String userName, String password, String serverName, Boolean integratedSecurity, Boolean unicode, Boolean omitOracleConnectionName)
at System.Data.OracleClient.OracleInternalConnection..ctor(OracleConnectionString connectionOptions)
at System.Data.OracleClient.OracleConnectionFactory.CreateConnection(DbConnectionOptions options, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningObject)
at System.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection(DbConnection owningConnection, DbConnectionPool pool, DbConnectionOptions options)
at System.Data.ProviderBase.DbConnectionPool.CreateObject(DbConnection owningObject)
etc.

Because HAT showed a send shape as being the last thing to start execution, I immediately assumed there was a problem with the Oracle Send Port associated with that send shape. A teammate, also trying to solve the problem, assumed the same. We were both frustrated because this problem hadn’t happened in dev or test – what was different now? We tried many things to try and get the production problem solved, but our efforts were in vain. The system ended up being rolled back and the old infrastructure put back in place – a true disaster.

A day later, after some of the pain and sorrow subsided, I contacted Microsoft. I was put in contact with a great support person, who looked at the problem and started asking some questions. I was a little frustrated with his questions at first, but now I know why he was asking them. He kept asking about database connections we were opening in C# code, as opposed to what seemed obvious to me, that this was related to the Oracle Adapter Static One-Way Send Port. He eventually explained his stubbornness for not accepting the problem at face value: 1) the error didn’t indicate anything about the Oracle Adapter and 2) HAT is not always accurate! He mentioned that he has seen many instances where HAT does not actually show the exact node where the problem is occurring; furthermore, he has even seen HAT debug values to be incorrect!

I couldn’t believe what I was hearing! Once he said this, I started to wonder… well, if the Oracle Adapter send port is not the problem (which would make sense since an earlier call to the same database via a solicit-response Oracle send port had worked), what could it be? I began examining the next node after the send, and found the node SiebelUpdate, which was trying to update an Oracle database via C# code. Immediately things started to click – right before deployment, I had been asked to use a new username/password for the Oracle Siebel connection. I had tested all of the previous credentials to avoid this kind of problem (there were 4 data sources being connected to in about 7 or 8 different ways), but I hadn’t tested the new one that had been given to me. I tried opening up the Siebel database using the credentials I had been given, and guess what, same error.

So here’s what I learned:

1. Don’t trust HAT. If we would have known this, I’m pretty sure we’d have figured this out prior to the rollback.
2. Look carefully at the error message – careful inspection showed that it wasn’t related to the Oracle Adapter.
3. Microsoft DOES use the ODBC connection for the Oracle Adapter (we had enabled logging but didn’t see anything – now I know why).
4. Don’t mistrust good old Oracle errors. I had, because there seemed to be no another explanation based on what was shown in HAT.

Comments (3)

Older Posts »