Arman's stuff
The Art Of Hiding

(Wed Jul 13 12:17:08 2011)

I've always been interested in steganography - that is, putting data into otherwise 'normal' files.

If you want to send a message to someone, it's easy enough to write a letter or send an email. But what if your message was supposed to be secret? Perhaps you're a spy, trying to smuggle information out of a country, or involved in illegal activity and trying to cover your actions, or perhaps you're just paranoid (is it still being paranoid when they're really out to get you?). If you send a plain-text message and it is intercepted, the 'enemy' can immediately read it. You could encrypt the letter, which would keep someone from reading it, but it would alert them to the fact that this is supposed to be secret. So you can't send the message unencrypted, for fear the enemy will read it and realize you're a spy, and you can't send it encrypted, or the enemy will become suspicious. What then?

Enter steganography, the art of hiding things in plain sight. The simplest forms use a simple mathematical key. for instance, someone wants to send a message: mission complete. He hides his message inside another, like so:
Minibusses disposing color mumps literature
If you look at every third character, it spells out 'mission complete'. Since that sentence is... somewhat odd... he could further hide it by using those words as the first on each line of a longer letter, or just picking different words.

These days, however, computers make the task of encoding (and decoding) words fairly simple - never mind that those words would rarely show up in a normal letter. And while encoding plain text is easy with that method, what if our spy needed to encrypt an image? The bigger the file he needs to encrypt, the bigger the thing he needs to hide it in. The code above hid a letter once every three; that means that 10 characters (which equals 10 bytes) would take up a minimum of 30 bytes - a 200% inflation, to use the lingo. But hiding binary data - that is, stuff that isn't plain text - needs a lot more space to keep from being noticed.

A popular method is to embed the files into images or sound files. If you take an image, and replace the lowest bit of every byte with a bit from your data, you could encode your entire file into an image fairly easily. Of course, the image would have to be large - the size of the header and footer, plus 8 times the size of your file. If you wanted to encode 'mission complete' into a windows bitmap file as plain text, you'd need it to be a minimum of 128 bytes long, plus the size of the header. That's 16 characters, times 8 bits per character. A windows bitmap (bmp) uses three bytes per pixel - one each for red, green, and blue. That means our image needs to have at least 128/3 = 42.667 pixels; a 6x8 image would have 48 pixels, which is perfect. So, for our 16-character message, we'll need a 6x8 windows bitmap image. But, like his sentence above, our spy decides to add a bit more padding to make it believable - who emails a 6x8 windows bitmap? So, instead, we'll only modify every 10th bit. That way, we only modify the color of a given pixel once, and it's a different color channel (red, blue, or green) every time. Now we'll use 1280 bytes, or a 16x27 pixel image. Hmm, that could almost be a user icon! And that brings us to the final step of the plan - the spy can upload this image as a user icon to a legitimate message board (about cars, gardening, or something else 'normal'), the enemy won't suspect it, and the message will get out loud and clear.

But what if the enemy is suspicious of our spy? Perhaps they notice our spy always uploads a new user image just before or just after something is stolen. They realize the image might contain something, and after running a brute force scan on the image, they find the changed bits and piece together the message. How to elude a scan like that? The trick is to encode the data. There are many ways to do that - picking a semi-random sequence, for instance. Pick a number between 1 and 16, and divide it by 17. Then, instead of using every 10th byte, use the next digit in the sequence. 4/17 is 0.235294118..., so the first bit goes in spot 0, the second one is 2 bytes away from that, the third is 3 bytes away from that, and so on. Unless you know the original number, you'll never be able to find the data, since the randomness causes it to blend in. Plus, the file could be slightly smaller, since each bit is 0-9 away, instead of always 10.

Or, you could make the data truly random by using the upper 7 bits to determine the distance to the next bit - that is, if the value of the current byte is 72, the next bit would be 72 bits away. That way, the message would be encoded into each image completely differently.

Putting a lot of data into an image is hard, though. The more data there is, the easier it is to spot, and if you have the original image, it's a cinch to compare the original and the copy to determine the difference. How could you go about modifying the original image, without anyone suspecting that it's been modified? Well, the best way would be to change the compression. Running the image though a converter makes the image slightly different every time. If you change the compression level on a jpg, the bits inside become radically different. So, if our spy's encoding program can change the compression of the file before it starts modifying bits, even if the file is found, the enemy won't be able to simply compare the two files. The data is safe - and, what's more, there is no proof that the image contains any data at all.

Once again, however, we come across a problem - a short message is easy to encode in a small image, but what happens when out intrepid spy wants to send a zip file with blueprints, or a sound clip of the enemy's secret meeting? Even a highly compressed mp3 is 480 kB minutes of conversation. Even replacing the lowest bit of every byte means the file will have to be 3.75 MB for every minute of sound! A 10 minute conversation would be a huge image file - far more than most message boards allow for user images. So what then?

Compression is your friend. If all you want to do is sent text, create your own alphabet; 0-9, a-z, and your choice of 27 symbols uses only 6 bits, instead of 8. If you can get away with only letters and 5 symbols, you can further reduce that to 5 bits per character. 'Mission complete' would then only be 10 bytes long. If the message is long enough, use a compression program; simple compression like zip or gzip can compress text fairly well, but for serious compression, take a look the PAQ family. It can compress 1 GB of plain text into only 133.4 MB (13.34% of the original size). Granted, it takes 18 hours to do that... but it saves almost 200 MB as compared to the fast (2 minute) compressor. If you need to get a significant amount of data out, it's the best way to compress it. But the resulting file is still pretty big... what then? Well, the best way to hide it is in a series of pictures. If we can only fit, say, 5 MB of data into an image, we'd need over 26 images to fit it all in. Photo sharing sites like Flickr might come in handy for that - but our spy would have to make sure to upload images that make sense. A cover as a budding photographer would be a handy excuse.

Of course, there are more than just images we could add code to; if you have the skills, you could add timing delays to programs, networks, or even hardware; perhaps a mouse that sends messages using the lowest bit of its 'distance moved' sensor? Or even a pencil that uses lead with alternating tint or hardness, so if it is used evenly, the line it draws contains a message.

So how about you? What will you encode messages and hide data inside?

<< A quick changeTaxes, taxes, taxes. Oh, and math. >>

This blag is tagged: Cool, Steganography, All