Interestingly, since "recovery" is mentioned several times, I decided to test my...

gmueckl · on June 5, 2019

Your method is clearly flawed. Altering a single byte once is insufficient as a test unless you analyzed the structure of the compressed file first to see where the really important information is stored. It may well be that you just modified a verbatim string from the source data in the gzip case, but corrupted a bit of metadata about how the compressed data is structured in the bzip2 case. If you tried a different random bytes, the results might be reversed.

The proper test would be to iterate over every bit in the compressed file, flip it and try to recover. Then compute number of successful recoveries against the number of bits tested. Compression algorithms that perform similarly should gmhave similar likelyhoods that a single bit flip corrupts the entirety of the data.

esaym · on June 5, 2019

I thought about that as well. I tried it three different times all with the same results.

dual_basis · on June 5, 2019

Three? Well then, case closed!

fao_ · on June 6, 2019

Did the poster imply that their test was the be-all and end-all of error tolerance in common-use compressions systems? No. Then why did you assume that they did say that, and then write such a useless comment

wereHamster · on June 5, 2019

Whether recovery leads to (almost) useable data depends on what byte you modify. It's entirely possible that a single corrupt byte in the compressed data leads to a single corrupt byte when uncompressed. When you are dealing with images you may not even notice that a single pixel is wrong. But it's also possible that you completely destroy the data such that the decompression algorithm can't even deal with it and has to give up.

chasil · on June 5, 2019

A decade and a half ago, I wrote an Oracle archived log that I had compressed with bzip2 to a DLT40 tape.

I recovered and uncompressed (without error) the log, then tried to apply it to a database recovery which rejected it as corrupt.

After several attempts to read the tape (amounting to dozens of hours), I finally put it in the original drive that wrote it and pulled the file to the remote recovery system - this worked.

I immediately began including PAR2 files on the tapes, so the restored contents could be verified and corrected.

I have my doubts that bzip2 is as sensitive to corruption as the author of asserts, but perhaps there have been improvements to the code since my misfortune.