Grokking grep and data disasters.

I’m not talking about the Linux vulnerability that my otherwise awesome web host apparently suffered through — though having this humble site go down for nigh on three days was certainly bad. No, I’m talking about a little data disaster of my own. I’m posting it here in the hopes that I can learn from my own mistakes. And maybe you can too, before it’s too late.

So… I’ve had two fairly important text files on my netbook for the past week or so. Friday afternoon I decided to move them to my main computer via SD card. Near as I can recall, here’s what went down:

  1. I copied said files to said card and trashed the originals on my netbook.
  2. I walked the SD card over to my desktop computer only to find that they hadn’t actually been copied for some reason.
  3. Thinking I had a faulty card I went back to the netbook, pulled the files from my trash folder and copied them again, this time to a USB stick. I checked the stick to verify that they were there and satisfied, put the netbook files back into the trash and emptied it. I know, stupid.
  4. Guess what happened when I plugged the stick into my desktop computer? That’s right, no files.
  5. Returning to the netbook I realized that the files were still open in AbiWord. Not seeing the actual files anywhere I tried to reload one one of them (I know, stupid), and when that didn’t work, I saved the other as a new file.

So with one file gone I turned to Google, and the results didn’t look good — particularly for the ext3 file system on my Xubuntu-powered netbook. I really wanted to understand this link but I honestly got bleary-eyed after about the tenth screen.

And then I started reading about a terminal utility called grep and how it could be used for file recovery. The shortest code I could find with the fewest variables was this:

grep -a -B 25 -A 100 'some string in the file' /dev/sda1 > results.txt

… Which would supposedly dump 25 lines of text before and 100 lines after the text string entered into a file. What the author didn’t state was that the saved results might end up overwriting what I was trying to recover. Kind of an important point, that.

Fortunately I had already picked up on this (it was in the comments); unfortunately when I ran this off of a live USB stick there wasn’t enough available memory to write the results to a file. Facing no other alternatives (that my feeble brain could understand), I booted from the same disk where my missing file sat in limbo and ran the command from that, with predictable results.

Lessons Learned

I consider myself fairly prolific with making backups — each and every week I archive this very site to a local file, then copy that to a private Dropbox folder. And each month I back up the home folders of both my computers (plus smartphone) to optical media. And just to be safe I now clone both boot drives to separate external hard disks using the awesome Clonezilla.

But for time-sensitive critical files I’ve come to learn that I should probably have at least 3 datasets, with one of those stored off-site. It would have been a great happy ending to this story were I able to recover my lost file with that grep command; on the other hand it’s probably a safer strategy to not even consider file recovery as an option in the first place.

Unless there’s something you wanted to school me on…?