What is file system fragmentation and why does it take place?
Before concentrating on file system fragmentation, it is first necessary to get familiar with the concept of a file system. It refers to the technology based on which data is managed within each separated section of a storage medium (a hard disk drive, SSD, USB stick, etc.), referred to as a partition. Created inside of it in the process of formatting, the file system comprises methods and structures used to control how and where every piece of data will be stored. Thanks to it, the data is kept in an orderly manner instead of a continuous stream of bytes. In addition, the file system maintains records permitting the instant retrieval of files whenever they are requested by the OS. When the operating system needs to delete certain data from the storage, the file system also provides a mechanism to perform this operation.
Different OS platforms typically rely on various types of file systems. The most commonly used ones include FAT/FAT32, exFAT, NTFS, ReFS of Windows, APFS, HFS+ of macOS, Ext4, XFS, Btrfs, F2FS of Linux and UFS, ZFS of BSD, Solaris, Unix. Though all of them in essence perform the same functions, their design and data placement strategies may diverge greatly.
Fragmentation is a condition which occurs when a certain file system is unable to prepare a contiguous area on the storage to save the whole file to a single location. Consequently, a file gets broken into pieces that are stored in detached parts of the disk. Those individual pieces are called fragments, and files whose fragments are not placed next to each other are then regarded as fragmented. To access each fragment sequentially and thus read such a file, each file system keeps special service information (metadata), which, among other things, contains pointers to those related fragments.
Ideally, the fragments constituting a single file should sit as closely as possible. And it is not usually a problem with a brand-new drive that has a lot of empty space. Yet, over time, as files get created, modified and deleted, gaps appear between them, and those are to be filled with new data. When incoming files are small, they can easily fit into the available gaps. But, more often than not, a file will be much bigger than the largest vacant gap. The size of an existing file may increase as well, but there may be no gaps adjacent to it. To perform writing without delay, the file system tends to allocate the data where it can find the room for it right at the moment, leaving the file’s fragments scattered around the storage.
In overall, fragmentation is more characteristic of older-generation file systems, like Microsoft’s FAT/FAT32. Modern formats mainly seek to keep it at a minimum by implementing various techniques:
Files are stored in contiguous areas called extents. An extent is presented as a starting address of this area and its length. When possible, the algorithm will pick out a single extent that provides the space needed for the file’s content, or will at least use a minimal number of extents. The typical examples of extent-based file systems include NTFS, APFS, HFS+, Ext4, XFS, and Btrfs.
The content of a file to be written is cached to RAM, whereas the actual write operation is delayed for as long as possible. When the final size of the file is known, there are much more chances that, eventually, a sufficiently sized extent will be selected, and no further extents will be required soon. Moreover, short-term temporary files do not have to be written altogether and can be deleted directly from memory. Many modern file systems use delayed allocation, including Ext4, XFS, Btrfs, ZFS and HFS+.
Some file systems are capable of detecting fragmented files by certain criteria and relocate their fragments automatically in order to make them contiguous again. Such algorithms are employed, for instance, by APFS, HFS+, Ext4 and Btrfs.
In spite of that, fragmentation cannot be considered a solved problem even in modern file systems. They may still suffer from it with aging, especially in the following scenarios:
a low capacity HDD or SSD used as a system drive;
a big number of small partitions arranged on a drive;
a storage is running out of free space (more than 85%-95% of the capacity is used);
big-sized files are frequently edited, above all, when there is space shortage;
deletion and writing of new files with different sizes when the storage is almost full.
What are the negative effects of fragmentation?
As time goes by and the file system is actively used, files stored in it may be divided into hundreds or even thousands of fragments spread across the drive. Such a state may have a serious impact on performance, depending on what kind of digital medium is involved. Mechanical hard drives keep information on spinning disk platters. In order to access it, the device has to move its read-write head over the surface and find each fragment of the requested file. When these are scattered in completely different locations, it takes a lot longer to reach them and retrieve the entire file than just read a large contiguous chunk. This also results in the rotating components of the drive being used much more extensively, which, in turn, shortens the life span of the device. In contrast, solid-state drives do not have mechanical parts and hence are not susceptible to performance degradation caused by fragmentation.
Apart from that, fragmentation aggravates the situation when data loss is encountered and the files need to be recovered. First of all, the possibility that a fragmented file will lose its integrity because of being partially overwritten with other data is much higher. Moreover, in some cases, the content may be still present on the storage, but remain unrecoverable due to significant fragmentation. From this perspective, the problem is further elaborated below in this article.
File system fragmentation in the context of data recovery
For a fragmented file to be restored correctly, it is not only necessary to determine its starting location, but also identify all fragments belonging to it and arrange them in the right order to reconstruct the file. The file system usually relies on its metadata to trace this kind of correspondence. When the service records are still available, it is usually possible to analyze them and discover which fragments are associated with which files. So, as long as a file is not overwritten, it is easy to undelete it, no matter how fragmented it may be.
But things get much worse when such files have to be extracted without the assistance of metadata. The latter may get corrupted due to some logical fault or damaged during a format operation. Better yet, certain file systems, like FAT/FAT32, deliberately wipe a part of it once a file is deleted. In this event, a data recovery tool can find the starting location of the file based on the knowledge of its structure. It will look for specific patterns within the raw drive’s content, known as signatures. However, the location where the next fragment starts can be extremely difficult or even impossible to predict, in particular when a file is fragmented into many pieces placed at a great distance.
Fragmentation is a serious challenge as regards data recovery. Even the most sophisticated algorithms cannot provide a 100% result when dealing with heavily fragmented data in the absence of usable file system records. And, unfortunately, the fragmentation degree of the most valuable file types – images, videos, office documents, databases, emails, etc. – is generally high. It is not unusual for about 15% of images and even 50% of video files to become fragmented in the FAT/FAT32 file system, which is frequently applied on portable media like thumb drives and USB sticks. In view of this, taking precautions against fragmentation is essential to avoid permanent data loss in case of any logical mishaps. File systems that are most subjected to it can be defragmented using specialized tools embedded into Windows or third-party utilities that are also developed for Linux and macOS.