Disk Defragmentation in Windows 7

I have always been interested in the internals of operating systems, and the science behind the engineering. One of those aspects is reading and writing to storage, and how that can be done in an efficient and performant manner.

The Engineering Windows 7 blog has posted how they work with hard disks, and how disk defragmentation has changed over the versions of Windows. It is a very interesting read. I wouldn’t dig into the comments, like usual, they are filled with trolls and flame wars. Oh well. If you haven’t been reading this blog, and you have any interest in OSs or in Windows 7, I highly recommend it. It is an open and honest discussion on the building of Windows 7 by the people actually building it. The posts are detailed, and explain some of the decisions that the Windows team has to make, and how they make them. Balancing all of the different interests and different use models and different users is quite challenging.

Here is my short summary, but you really should just go read the real thing.

Because the hard disk is so much slower than the CPU, how the OS interacts with the disk is very important. Some of the key principles for this are very similar to the guidance on how to use services. From their post:

    1. Perform less I/O, i.e. try and minimize the number of times a disk read or write request is issued.
    2. When I/O is issued, transfer data in relatively large chunks, i.e. read or write in bulk.

This is very much like with services. You try to make non-chatty services, that are chunky in nature, and call them when needed. Your application’s performance will suffer if you make to many small calls to a service. All of the serialization, deserialization (I love that word, and I just added it to my spell checker), dispatching, and transport costs you latency. This is a bigger problem with SOAP than with REST because of the overhead of a SOAP message, but the concern is still there.

Back to disks, and how slow they are. The team has figured out, long ago, that you want to read big chunks at a time, so even if the user has requested one small part of a file, they should read a lot more of it, so that it is ready and in cache. For example, when streaming a music file to the player, the player asks for the first 64K (or however big the buffer is). The OS will request more than that, assuming the user will want the rest.

This has become an even bigger issue as files have grown in size over the years. Ten years ago, people didn’t have terabyte drives on their desktops, and file sizes were in the KB’s, maybe MB’s.

In order for the disk to more easily read a file, it helps if the file is allocated in a sequence on the disk itself. If the file is fragmented into chunks all over the disk, the disk will take longer seeking out those pieces and returning them. The practice of making sure the files are assembled together in sequence, and perhaps putting them on the most efficient locations on the disk is known as disk defragmentation. As a concept, it is relatively simple. Move all of the open space to the end of the disk (think of it as a virtual sequential tape). Then rearrange the pieces of the files so that they are all together. It is common for related files (perhaps for OS startup) to be put back to back to make reading them even faster.

The Windows team has found out that a single large file doesn’t have to be in one long sequence. As long as the fragmented chunks are large (bigger than 64MB) then they aren’t rearranged, because moving them really wouldn’t help. It is the constant zigzagging for small pieces that costs big during I/O.

I remember defragging my hard drives all the time. Especially before I installed a new application (and by application I mean game). By consolidating all of the open space, when I installed said game (I get pangs for Civilization just thinking of this), then all of the files would be contiguous, and grouped together for performance.

In Windows 7, the algorithm has been tuned. There were files in Vista that could not be moved by defrag. These were usually NTFS meta data files. If you can’t move these, you can’t shrink your disk volume, which if you are using VM’s or want to rearrange your disk partition is a big issue. Windows 7 is now able to move them.

Also, many new laptops, especially netbooks, come with SSD drives. Defragmenting them may not matter, and even if it would help, it would cut into the lifecycle of the drive. Windows 7 will not automatically schedule a defrag on an SSD drive.

An interesting note is that auto defrag is not enabled on Windows Server 2008 R2. This is because how file fragmentation affects the system is dependent on the unique workload on that system, and an experience system administrator should configure the defrag process to meet those needs. You wouldn’t want a big defrag going on just as the nightly backup starts, for example.

A big change is in the UI. The team has made it possible to schedule the defrag process to your liking, and you can schedule multiple disks in parallel (in Vista they had to be in sequence).

From the post:

image_10[1]

image_12[1]

Stop reading my blog, and go read their blog already!

Comments

Popular posts from this blog

Farewell

How does an Architect pack?

Job security is a myth, and how IT Pros are Thriving