BizTalk Performance Testing Tips

In a lot of BizTalk Server environments, performance is critical. It is not uncommon to hear for a client that they need to be able to process a specific level of transactions in a certain time window. Unfortunately, it is usually followed by the question: "So, how much hardware do I need?"

There isn't anyway to answer that question because there are too many unknowns. How big are the messages? How complex are the pipelines and maps? What about the orchestrations, if any? What other systems or adapters will be involved?

There are several strategies for finding out how much hardware you need. The first is a 'grow as you can' model. You deploy your system on a good foundation. A good SQL Server and a good single or pair of BizTalk servers. Once in production, slowly increase the traffic or consumers of the business process. As limits are reached, add more servers to the BizTalk group. This is a very organic model, and allows you to add only what you need.

This model won't work in some enterprises where budgeting and accounting are more important than the properness of the solution. In these cases, they want a number up front (even before you could fairly SWAG it) and you have to stick with it. To that end, a lot of IT groups over estimate the cost of the project, almost in a negligent manner, and create this giant plan. This will either lead to the company spending more than it should (it's always a bad thing to have to go back for a second dip in the money well in these types of organizations), or the project gets canceled for costing too much money.

There is another way, and it is sort of a blend. You can prototype some of the processes on some trial hardware, and then extrapolate from there to determine the cost of the project. You will still get estimated figures, but they will be based on results, and not on beer and dreams.

Microsoft has finally made a document called Managing a Successful Performance Lab public, that helps you learn how to manage a performance lab test.

I don't want to cover what is clearly laid out in the paper, but I do want to add some of my own thoughts and some high level guidance.

First, make sure that you select a business process that is representative of the work the system will be handling. Build that process out as you would for production. But don't go so far that you end up actually writing the system. It is OK to cut corners. This is a prototype. Just make sure that you involve the adapters and third party systems you will use in production. Which adapters you use can really affect the systems performance.

Make sure you not only find a good process to test, but make sure you set realistic expectations about the traffic it will need to support. For example, a system might sit idle through most of the day, and then have to process large batch files at night as submissions are sent in from partners. Or, the system might receive small requests throughout the day (web service calls for example), and the occasional floodgate batch (5-10 month). So, sit down and think about the traffic shaping for the system.

Then, setup your test environment. You should have at least two servers, one for the SQL Server, and one for BTS. If you plan on have dedicate hosts (send, receive, exec), then extra boxes would help you model what you think your final production physical environment might be like.

Run the BPA! Download and run the BizTalk Best Practice Analyzer. Fix the first thing on the list, and then run it again. Repeat as necessary. This is a fabulous tool, and helps a great deal. Any issue found by it has a link to specific instructions to fix the issue. It will find practices that you can't or won't want to do (false positives). But it will catch a lot of configuration and environmental issues for you, including the MS-DTC trap, which is probably the most common issue asked about on support groups.

Develop a test plan! Boy, I sound like a PM saying that. Plan out what tests you will run, and what they will entail. Develop a way to track results. The key to running good tests is to only ever CHANGE ONE THING AT A TIME. If you change more than one thing, you won't be able to verify what impact each change truly had. Again, only change ONE THING AT A TIME. It will be tempting to cut corners, but if you are going to do that, you might as well not do the performance tests at all, forge the numbers, spend the budget at Best Buy, and call it a day.

The test plan should also include what tasks should be done at the beginning and end of each session, run, and iteration. The steps should be followed ruthlessly. Again, human laziness is your enemy here. Your best bet is to script or automate as much of this as possible. You should also have a printed checklist and a pencil. A team of people will be better for this than on geek in a corner as well. They can keep each other honest.

The test plan should include sample messages, and the performance counters that will be tracked for each run. You can always add more perf counters based on what you are looking for. The Perf Lab whitepaper can get you going in the right direction, but here are some you should do:

1. Spool depth

2. Throttling levels in the system

3. CPU %

4. Memory %

5. Disk Idle time on the SQL Server %

We usually track about 100 counters in our tests, as a baseline. A separate machine should be used to track the counters. After each test, the perf counters log should be saved for reference later. We usually assign a number to each test run, and name the log file with that number. This number is then used in Excel to track the results.

The best way to put a load on your system is to use a tool called Load Gen from Microsoft. It is very configurable and extensible. We usually configure it to drop files in the pickup folder at a certain rate for a specific period of time.

We usually break up the test plan into runs. Each run represents a specific traffic shape. For example, we might start with a batch of 100% good messages (no errors) with 10 transactions per batch. Then each iteration of that run would have progressively more load placed on the system. Each run should have the same progression. The progressions are usually 1, 10, 20, 50, 100, 250, 500, 1000, etc. The next run would have a different traffic shape. We will usually do several runs that only differ in how many transactions per file. Start with 10, then 100, then 500, etc. The traffic shape patterns should become more complex in successive phases of testing. We usually start with simple batches, and then evolve the configuration of LoadGen to generate more realistic scenarios with blends of traffic. For example, 20% traffic is steady and in small batches (real time requests), with 50% in regular, but spaced out medium sized messages, with 10% of traffic with significant errors, and then the rest of the traffic as a floodgate scenario. This mix should match your traffic shapes you worked out in your test plan.

Before each test, the various BizTalk databases should be cleaned out. There are scripts that can do this for you. You don't want later runs to be affected by slower inserts because the tracking database has grown very large. You should also reset any other systems that you are hitting. For example, if you are dropping failed batches to a sharepoint site for manual repair, that doclib in Sharepoint should be cleaned out after each test. Your goal is to start each test to start with the same environment so the test results reliable. With that in mind, you should grow your SQL databases before testing so that the early test runs don't pay the runtime grow tax on SQL performance.

Before each test a simple message should be run through the system to 'prime the pump.' We have found this helps to normalize the test results, making the test results of small batches more reliable.

After all of the test runs are completed, you will need to determine a scale factor for the system. This scale factor will be used to determine what the final production environment might have been able to sustain. For example, a factor to account for the real process being twice as complex to execute, and a second factor to account for dual SQL servers, and four quad servers in the BTS group.

Before the test you should become very comfortable with the topic of 'Maximum Sustainable Throughput' for your system. There are several blogs out there on this topic. It is also covered in the Performance Lab whitepaper mentioned above.

In short, MST is how many transactions your system can handle without creating a backlog that can't be recovered from. This is different from how many transactions can be completed per second because each part of the system will operate at different speeds. Many times, after a perf lab is completed, a second round will be run to specifically find the MST for that system. These tests are usually setup to overdrive different parts of the system to narrow down and define the MST for the system.

A quick list of things to change between runs?

1- Which server in the group is running which Host Instances? It is proven that breaking send/recv/exec into separate hosts, even on the same box helps improve performance. This is because each then gets its own pool of threads and memory.

2- Maybe rework maps, or the intake process on the receive side. A lot of times if performance is critical, a custom pipeline component will need to be developed.

3- Rework the orchestration to minimize the persistence points.

4- Tune the system in the host setup screens, or in the registry to better suit the majority of your traffic. BizTalk does come out of the box tuned very well for the typical business message. But if you end up processing a high number of tiny messages, or large messages, then you can get more performance by adjusting some of the tuning parameters.

That was a longer post than I expected, and I think I could keep on going. Maybe I will expand further in future postings, maybe with sample deliverables.

Comments

Popular posts from this blog

Farewell

How does an Architect pack?

Job security is a myth, and how IT Pros are Thriving