Using BLOBs in Azure to store homemade BSG episodes

The other day I discussed using Queues in Azure, and why you would want to. Today, we will talk about using BLOBs.

First a bit about BLOBs. BLOB stands for Binary Large OBject, and is a way to store binary data. Up until now, whenever I used BLOBs it was as a column type in SQL Server and Oracle Server. This was always challenging for me because the APIs to store and read BLOBs were a pain, and hard to use. At least for me. The other problem was related to maintenance on the database server. Your database will get very big, very quickly by storing images and videos in the database. There may be some very good reasons for you to do this, but for me, in the past, the tax was too high. The approach I generally took was to store the image in a folder somewhere, and store the path and filename in the database. This did create a maintenance issue, in that we have to backup the file folder, and make sure it was transactionally in sync with the database backup. That is a whole different blog post.

BUT, if you are looking at using BLOBs in a database, check out the new  FILESTREAM feature in SQL Server 2008. I wish I had that about five years ago.

This common pain is what causes most people to wince when they hear BLOB, at least I know I do. But BLOBs are important. They are a great way to store all of that unstructured data we have. Our world is becoming rich media centric in what we do, and what we store.

BLOBs don’t have to be pictures or videos. They can be any binary stream. Perhaps a large catalog detail file, or a backup history. It can really be anything. BLOBs are opaque though, so you usually can’t scan them during a query or to help in an index. In that case, make sure you store some metadata with them.

BLOBs are one of the three pillars of the Azure storage fabric. There are queues, tables, and BLOBs. Any data saved in the storage fabric is stored in three different replicas. This is done for reliability, and scalability reasons. This storage is also shared across your account. So you can have one node store files into the BLOB storage, and another node could read it. This is very similar to using file storage in an on-premises application.

Within your account, you can organize your Blobs with containers. These are just a simple mechanism to segment your Blob storage, and make it easier to work with them. At this point, it isn’t possible to nest containers, like you would file folders on your file system.

Once I have created a container, and Blob, accessing it is as easy as browsing to this URL:

http://BHP.blob.core.windows.net/Fingerprints/Prinsoner198

Blobs and containers are locked down to only be accessible by the account owner. If you want a container of Blobs to be publicly accessible on your site, you can use access control lists on the container. In this way, you can grant anonymous users read access.

In your application code, you will want to reference the Microsoft.Samples.ServiceHosting library. This dll will hold some nice classes that will make working with Azure Storage easier. You can find it in the Azures Samples folder that comes with the SDK.

To store a Blob in a known container, you would use the above URL, but with a PUT verb instead of a GET verb. When you ‘put’ a Blob, the size is limited to 64MB. If your file is bigger than that, you can use the Put Block method. This will allow you to store the Blob in 4MB blocks until you are done. The maximum size of any Blob is currently 50GB. That is pretty big. Want to know why it’s 50GB, and not 51GB or something like that? Because they needed a number, and no one will ever need more than 50GB for a single Blob. :) If it was me, I would have made it just large enough to hold a complete BlueRay movie. You know, for when you want to store your home made BlueRay movies in the cloud.

One of the scenarios you might use Blobs for are to store images and videos (or other user generated content) on your site. In that case, storing them, and displaying them back to your users is pretty simple.

Another common scenario, which ties into the post on queues, is the transitional scenario. In this case, a user might upload a video for processing. Your application would store the video into Blob storage, and then push a work ticket into the queue. The work ticket would hold only the top meta data (user name, transaction id in your db, the name of the Blob). The worker node would pull this off of the queue, pick up the Blob, and process it. It might then put the results into a different Blob container, and then finally delete the original Blob out of storage. Guess what the delete command is. You guessed it, you just change the HTTP verb to DELETE. Same URL as above. Of course, the user has to have permissions to delete, so don’t worry that some script kid is going to start deleting all of your homemade Battle Star Galactica movies off of your site.

Comments

Popular posts from this blog

Farewell

How does an Architect pack?

Job security is a myth, and how IT Pros are Thriving