Friday 30 July 2010

File Fragmentation and Defragmentation

After a partition and a formation on a computer hard drive, all the sectors in user data area belong to this partition are marked as ‘Not in Use’. A number of new files will be introduced to and stored on these usable sectors sequentially and consecutively. A sequence of user data sectors are marked as ‘Occupied’ (coloured as shown in the figure below).

File0

File1

File2

File3

File4

File5

When some of the files are deleted or removed, the corresponding sector will become ‘Not in Use’ again. In this example, File0, File3 and File5 are deleted for demonstration.

File1

File2

File4

Another new file of File6 is introduced and stored on sectors that marked as ‘Not in use’. As a result, File6 has three pieces of fragments.

File6

File1

File2

File6

File4

File6

File6 is deleted accidentally by user, what happens to the sector map is the fragments belong to File6 are marked as ‘Not in Use’ (as in grey) again:

File6

File1

File2

File6

File4

File6

To recover this deleted File6, the recovery application locates the starting sector of this file by looking at the file header. But it will always assume that this file was stored consecutively without fragments. As a result, the recovered File6 is going to be looked like this:

Recovered File6

File6

File4

File6

Obviously, the recovered File6 contains the first fragment of File6 and content from File1 and File2. Even though this file is opened, it will be corrupted.

To use a conventional recovery application to recover the lost/deleted files from hard drives, it is quite common that the files are corrupted or partial corrupted when they are opened. For example, a JPEG is partially corrupted, a word document is unable to open due to data corruption or it is opened but with all unknown characters and so on. This problem will become severed and fatal when the lost file type is database format (MS SQL, MS Access, Oracle, etc.) .

All these kinds of lost files mentioned above are deemed as unrecoverable files, because:

· 1. Data sectors are overwritten after file deletion by introducing new files;

· 2. In FAT32 file system hard drive, the MSB (Most Significant Bytes) of current file address are cleared to zero after its deletion. Without knowing the precise starting address of the deleted file, the recovery software will just assume the MSB is zero. Even though the recovered file has its original name and correct size base on this recovery algorithm, the file is still found to be corrupted after opening. Some advanced file recovery application has tried to solve this problem. But it is only working when the lost file has NO fragment on hard drive.

· 3. Generally speaking, database files are stored in discrete sector areas as it grows every day. All the recovery applications assume that the files are stored in a successive and linear area on a hard drive. It is because they are not able to determine and find all the discrete fragments that belong to a deleted file.

File Defragmentation (FD) technology is proposed in this article. File defragment is a term used to describe a process to search for all the fragments belong to a same file and reconstruct this file using the fragments found. Using this technology on the example above, the recovered File6 should look like this:

File6-1

File1

File2

File6-2

File4

File6-3

File6-1

File6-2

File6-3

As a result, the deleted File6 is recovered successfully and it is 100% intact. If any fragments belong File6 have been overwritten, even the FD technology will not help anymore. This is simply because data has been magnetically overwritten or removed.


Written by: Zijian Xie (R&D Manager, BEng, MSc)

No comments:

Post a Comment