This was basically a showerthought. How could I store files online, in plain sight, for free. Because who doesn’t like a good ‘ol game of hide and seek. But with files. On the internet.
someone pointed out that I made a mistake with the meaning of the 4th byte of the chunk type. I’ve updated the table to reflect the proper meaning.
- Hide files in plain sight
- Allow them to be distributed via free public channels. E.g Twitter, Reddit, imgur.
Finding a format
I spend an evening reading up different file formats. I considered all sorts of file formats, but none of them really tickled my fancy. Until I ran across PNGs. PNG files are very well structured. And soon you’ll realise why they’re perfect to store a payload. PNG files start with an 8 byte signature,
89 50 4E 47 0D 0A 1A 0A. The first byte is a non-ASCII character, byte 2 through 4 spell out
PNG in ASCII. The remaining bytes are line ends, the DOS EOF character, and another line break.
What follows next are what is known as chunks. The PNG i’ll use in this example comes from the Wikipedia page on the PNG format and can be found here
For the image on the Wiki page, the chunks follow this format:
|4 byte chunk size||4 byte chunk type||N byte chunk content||4 byte CRC|
IHDR contains metadata related to the image such as width and heigh.
IDAT contains the actual image data and
IEND marks the end of the file. The chunk type naming follows a very clear convention:
|first letter||second letter||third letter||fourth letter|
|uppercase||Critical Chunk||Standard Chunk||Reserved||No Safe to copy|
|lowercase||Non-critical||Non-standard Chunk||n/a||Safe to copy|
For example, for the
I: it’s a critical chunk, e.g the file can’t be rendered without it.
H: It’s an offial chunk type that’s been standardized in the spec.
D: Reserved chunk that always needs to be uppercase.
R: Unsafe to copy if other chunks have been edited.
Getting down and dirty
First we have to come up with a chunk name. One of my coworkers calls everyone a little punk, and with chunk types needing to be 4 ASCII characters, punk is perfect. Following the table above on chunk type naming, I settled for
To make my life easier, I’m working with some helper functions called
read_bytes_as_int. You can find a link to the complete source at the bottom of this post.
Let’s open up our file:
self._file = open(input_file, 'rb+')
We have to open it in binary mode to make sure we won’t have any reading issues later on.
This reads the first 8 bytes of the file. This is the byte signature that we’re not really interested in. What should come up next are chunks.
chunk_size = self._read_bytes_as_int(4) print 'Chunk size:', chunk_size chunk_type = self._read_bytes_as_ascii(4) print 'Chunk type:', chunk_type content = self._read_bytes(chunk_size) crc = self._read_bytes_as_hex(4) print 'CRC:', crc
Which will output:
Chunk size: 13 Chunk type: IHDR CRC: 9a768270
Perfect! Let’s loop through the entire file until we reach the EOF
Chunk size: 13 Chunk type: IHDR CRC: 9a768270 Chunk size: 218087 Chunk type: IDAT CRC: e11d26bc Chunk size: 0 Chunk type: IEND CRC: ae426082
Injecting the payload
I’m a lazy man, so let’s inject our
puNK payload at the end.
if chunk_type == self._END_CHUNK_TYPE: # IEND self._inject_punk_chunk() self._file.close()
Diving inside of
inject_punk_chunk: First we need to move back the cursor in the file by 8 bytes. It’s 8 bytes because we have 4 byte chunk type, and a 4 byte chunk size that we need to overwrite.
The CRC bytes is a cyclic redundacy check over the chunk type and the content. Not the length. So let’s create a new byte array so we can easily create this CRC.
tmp_bytes = bytearray() tmp_bytes.extend(bytearray(self._PUNK_CHUNK_TYPE)) tmp_bytes.extend(self._bytes_to_hide)
Now with this ready, we can start writing to the file:
self._file.write(bytearray(struct.pack('!i', chunk_size))) self._file.write(bytearray(self._PUNK_CHUNK_TYPE)) self._file.write(self._bytes_to_hide)
Notice I’m using
pack here because we need to write a 4 byte integer to the file. Not just the chunk size. The
! specifies big-endian encoding.
Now we have to write the CRC bytes. The CRC returns an integer, which needs to be 4 bytes, so again we use
pack to write this to the file.
crc = binascii.crc32(tmp_bytes) self._file.write(bytearray(struct.pack('!i', crc)))
And last but not least, we write the EOF chunk
self._file.write(bytearray(struct.pack('!i', 0))) self._file.write(bytearray(self._END_CHUNK_TYPE))
Okay, that should be it! Let’s try to inject an image as payload. Because I like dead memes, we’ll use
Run the script that loops through the chunks, and injects the payload at the end:
Chunk size: 13 Chunk type: IHDR CRC: 9a768270 Chunk size: 218087 Chunk type: IDAT CRC: e11d26bc Chunk size: 0 Chunk type: IEND CRC: ae426082 Hiding 27 kB ( 28208 bytes) Injecting punk chunk Punk chunk injected Reached EOF
Looping through the chunks to see if the chunk got injected properly:
Chunk size: 13 Chunk type: IHDR CRC: 9a768270 Chunk size: 218087 Chunk type: IDAT CRC: e11d26bc Chunk size: 28208 Chunk type: puNk CRC: 8cccb594 Chunk size: 0 Chunk type: IEND Reached EOF
Excellent! I opened the file, see the dice. And no doge. Exactly what is expected.
Getting our file back
Now that we have a file with a payload, we need to get it back. Inside of our chunk parser, we get the content. That’s great because now all we need to do is check if whether we encountered a
puNK chunk, and if we did write it to a file. We create the file like this:
self._output = open(output_file, 'wb+'), and write to it like this:
if chunk_type == self._PUNK_CHUNK_TYPE: print "Found a punk chunk", len(content), "bytes. Writing to file" self._output.write(bytearray(content)) self._output.close() self._file.close()
Chunk type: puNk CRC: 8cccb594 Found a punk chunk 28208 bytes. Writing to file
Quick MD5 check to see if the files are equal:
md5 doge.jpg doge_from_punk.jpg MD5 (doge.jpg) = 9023d02eefc75f4c6ce177795e620b29 MD5 (doge_from_punk.jpg) = 9023d02eefc75f4c6ce177795e620b29
Sweet! We’ve just hidden an ancient meme inside of a picture of 3 dice.
Distributing it to Imgur
The goal of the project was to store these files in broad daylight without anyone suspecting a thing. Time to upload the file to IMGUR. Here she is in all her glory:
Hidden underneath is a Doge meme… or is it?
Let’s find out:
> wget http://i.imgur.com/Qk5BP19.png > md5 Qk5BP19.png png_out.png MD5 (Qk5BP19.png) = ba56411b9753a9ff2dc4aa74d079e4c8 MD5 (png_out.png) = ba56411b9753a9ff2dc4aa74d079e4c8
For good measure, let’s extract the payload. I’ve written a Punk class by now,
punk = Punk() punk.decode('Qk5BP19.png', 'doge_from_imgur.jpg')
And an MD5 hash check
md5 doge.jpg doge_from_imgur.jpg MD5 (doge.jpg) = 9023d02eefc75f4c6ce177795e620b29 MD5 (doge_from_imgur.jpg) = 9023d02eefc75f4c6ce177795e620b29
We can now store any type of arbitrary data on other people their servers, without them ever knowing about it.
All thise code works, but is a quickly written POC. You can optimize it no doubt, and make it deal with larger file sizes. PNG chunks can only store up to 2 gigabyte, and most image hosts only allow you to store a few megabytes.
For the future:
- Come up with a format to distribute a file over multiple PNGs
- Make it redundant, allow for uploading to multiple sources
- Add GPG encryption options for an added layer of security
And last but not least, you can find the gist with all code here.
from punk import Punk # First param is file name, 2nd param is bytes you want to inject. punk.encode('png_out.png', file('doge.jpg').read()) # First param is the file name, 2nd param is output file name. punk.decode('png_out.png', 'doge.jpg')
No external libraries needed. Because I’m awesome like that.