Queues in Azure
Many modern systems are now being designed with SOA principles in mind. This usually means they are designed as a composite application of several services working together. As part of this structure, you usually need a way for the different services to communicate.
A common way is to use an Enterprise Service Bus, or even just naked, direct SOAP calls. This works when the systems are synchronous in nature. But if the service you are leveraging is very asynchronous, meaning it is more like a back end processor, or bulk processor, then you are likely going to end up working with queues. The advantage to queues is that they help enforce some loose coupling to your architecture. Just make sure that you pick a queue-ing technology that supports the protocols the consumers will need (ie SOAP, REST, COM+, etc.)
If you are working with Azure, then you can easily leverage the queue infrastructure already build into the storage fabric of Azure. Before you dive in, there are a few things you should know about how queues work, and some of the design limitations they have.
Queues are FIFO. That means the first message in is the first message out. Much like a line for tickets at the movie theatre for Star Trek. The first nerd in line, gets the first ticket, etc.
Because it is possible for the processing agent that took the top message to fail, it is possible for a message to get forgotten about in an architecture like this. In this case, most queue servers have an ability to mark a message as read, but not actually delete it until the processor says it is successful. In this manner, if the processing fails, your code can find stale read messages, and reprocess them after a period of timeout. The ‘read’ state also keeps other processing nodes from picking up the same message, and processing it a second time. It is very common in this scenario to have several processing nodes reading messages off of the same queue. The queue becomes an abstraction to talking with the group of nodes, and is an easy way to balance the load across the nodes.
Queues are a one-way asynchronous messaging system. I can use a queue to send you a message, but there has to be some other mechanism for any return message. Sometimes this is just a second queue, but more likely there is some other out of band signaling going on. Perhaps the sudden appearance of data in your database, a flag being set, or a flat file gets picked up the next morning. Another common return path is for the back end processor to call a lightweight service (REST of SOAP) that merely reports that the specific message has been processed. For example, the contract might include an order id, and a final status (completed, shipped, error, pineapple, etc.).
You don’t just dump a giant message on the queue, this will surely lead to bad performance, regardless of which queue server you are using (Azure, MSMQ, MQ Series, etc.) If you are just processing an order from a web site, then it might be ok. But if you are processing an image, or something of real size, you are better to follow a pattern called a ‘control message’ or ‘work request message’. In this pattern, you drop off the actual large part of the message in some common store. This could be a common file system, common database, or the BLOB storage in Azure. Then you put a message in the queue that tells the backend processor what needs done, and which item in the common store to use.
In the always common image thumbnail generator scenario, you might put the uploaded image into BLOG storage, and then put a message in the queue that states the name of the item in BLOB storage, the expected thumbnail dimensions, and an account code to bill the work to. The backend processor would then pick up the message, go fetch the image, do the work, bill the proper code, and then dump the thumbnail back into the common storage. The consuming website must just keep looking for the particular thumbnail filename to see when it is done, or you could leverage one of the callback mechanisms mentioned above.
It is common to have one queue per message pattern, meaning all messages going into the queue should either always be bound for the same destination (all messages pertaining to customer records), or be of the same verb (process this image, produce report). The downside to this is that it is very easy to end up with a proliferation of queues. This leads to a management nightmare, as well as a lot of traffic.
In Azure, you can create as many named queues as you want. When you put a message onto a queue, it can be no larger than 8KB, and must be XML. This is to keep the platform fast, and super scalable. A queue can theoretically hold as many messages are you want to put in it, but I haven’t done any performance or scalability testing on the Azure queue to see if this holds up.
The API is RESTful, and you can place or read items from the queue from anywhere that make that REST call, it doesn’t have to be code running in an Azure role. This means that you can host your backend processor in the cloud, to get eth dynamic scalability to respond to spike events, but wire up your preexisting applications to feed that queue.
What is the address of your queue? It depends on what you name it, and your account name for Azure. Perhaps you named your queue ImageProcessing, and your account name is BHP. In that case, the address for the queue would be : http://BHP.queue.core.windows.net/ImageProcessing. As you make REST calls into this address, make sure you remember you are addressing the queue at large. Meaning a delete command would delete the queue. To add a message to the queue, you need to extend the URI a little, to something like this: http://BHP.queue.core.windows.net/ImageProcessing/messages. Of course there would be parameters that hold that actual message content you wanted to add.
When you get a message from the queue (a HTTP GET against the URL above) you have to optional parameters you can define. The first lets you fetch more than one message at a time. This is important for scale reasons, when the overhead of fetching a message is high. In this case, grabbing a batch of messages is more efficient. The second parameter allows you to set the invisible timeout, up to two hours. If you don’t delete the message before this timeout, then it will reset back to visible, allowing someone else to pick up the message.
When you GET a message, you are given a pop receipt id. This id is needed in order to DELETE the message when you are completed with it’s processing. You will also need to supply the message id itself (which is a GUID). This is to make sure you delete the proper message off of the queue, and that you are the most recent recipient of the message. Remember, in a timeout scenario, the message could be revived, and given to another processor. If the timeout, which can be set on the PUT, expires, then the pop receipt will expire as well. This keeps you from running into conflicts when things go haywire.
If people want, I can code up a sample, and walk through it.
Comments