Tuesday, June 12, 2012

Recovering lost ODS Spreadsheet on USB Key

This weekend I mistakenly wrote a Linux image onto my USB Key instead of my SD Card. The result was a 25% overwritten 8GB USB Thumbdrive.
Recovering the data using Recuva didn't really work because of the origial FAT32 partition got overwritten with a FAT and ext4 partition. To make things worse both the first and second FAT was lost.
Doing some deepscanning allowed to retrieve apparently useful data, only to discover the recovery process wasn't able to create valid files. For fragmented files the recovery process used the first cluster and grabbed sequential clusters resulted in mixed up filefragments from differents files into the recovered file. What a disaster !
One of the files I desperately needed to recover was a 1850 Kbyte OpenOffice spreadsheet in ODS-format. Opening the file in  OpenOffice resulted in a 'corrupted file, do you want to repair' message and an empty file.
A bit of research pointed out that ODS-files are actually zipped containers containing the actual formating and content. In my case  bits of a PPT file, a MP3 file and other junk was mixed into the recovered file by the recovery process.
Finally I ended up using iBored as a forensic tool to perform some manual carving on the residual data on the thumbdrive.
By looking at a normal ODS-file, I identified the different zones of the OpenDocument format. Special markers in the zipped structure of the ODS-format are '50 4B 03 04' and '50 4B 07 08'. Other interesting and easily identified markers were the file and XML-tags contained in the ODS-file format, like 'META-INF/manifest', 'content.xml' and 'meta.xml'.
The thumbdrive happened to have 128 sector clusters (64KB) and the first cluster recovered by Recuva happened to be the correct start of the file. Knowing that one file only occupies 1 cluster and that sudden change of content inside consecutive clusters or zeroed sectors at the end of the cluster mean there's a rupture in the cluster sequence, I was able to verify the signature of the clusters following the first valid cluster of the file I was trying to recover. Using iBored as a rudimentary diskeditor, I first searched for the 'META-INF/manifest' string typically found at the end of the ODS document. It allowed me to pinpoint the end of the file approximately 7795 sectors after the first sector. I made a list of cluster boundaries and started identifying the content of the cluster by looking at typical signatures like 'MZ ...', directory entry-structures,  typical binary or DLL structures, plain text and randomized data.
Although zipped structures don't contain any internal indexes, they are characterized by the heavy compression and the full use of the 8-bit space, containing no wasted spaces with repeated values or patterns. 47 clusters could be eliminated as cluster belonging to the lost file. Looking at the remaining clusters it looked as if the file was stored in 2 chunks with some garbage inbetween.
Using the write block functionality of iBored, I wrote out the 2 chunks to the harddrive and reassembled the original file by doing a binary copy :
copy /b chunk1+chunk2 final.ods
By opening the resulting file in 7-zip, I could validate the correctness of the assembled data.
Opening the file in OpenOffice resulted in a 'Corrupted file - do you want to correct ?' and Calc was able to repair the inconsistencies without loosing any data or formatting information.
YAY !