A while ago, a friend asked me if it would be possible to embed a video in a PDF report, so that he could have an animated diagram of some algorithm he was writing up. This is not a reasonable use of PDFs, but it got me thinking – instead of embedding a video in the document, could we make the document double as a video?
As it turns out, we can, at least for some PDF viewers. Try my demo here – confirmed to work in Firefox (with pdf.js) as a PDF and in ffmpeg as an AVI. It is a fully standards-conforming AVI file, but not an entirely valid PDF, and relies on the reader being lenient enough to accept headers and footers in invalid locations.
In theory, the PDF standard says that a valid file begins with a header, which starts with %PDF
. In reality, many parsers are more lenient, to make sure documents written by non-conforming software can be read, and will allow up to a kilobyte of data before the header. Everything before the first %PDF
will be skipped.
This means that our polyglot file doesn’t have to start with a valid PDF header, as long as we can put it somewhere near the start of the file. The easiest way to embed the document is to put the entire contents somewhere near the start of the file: as long as the header is recognised, the rest of the file will be read correctly, and the PDF parser will stop when it encounters the %%EOF
end-of-file marker.
In practice, this is not quite true for all readers, but is good enough to have a working polyglot in Firefox. Some readers will look for an xref table (which is between the document body and the EOF marker) by starting at the end of file and seeking backwards, instead of reading the entire body first. This will fail on our file, whose xref table is nowhere near the end of file, since the PDF data is followed by the video contents.
Fixing this is probably possible, but would require a lot more care. One option would be to rewrite the PDF to move the xref to the end of the file and somehow “hide” the junk AVI data in the middle from the parser. Alternatively, it is possible to have multiple xref tables, with the last one containing the offset at which the previous one can be found (“incremental updates”, meant to support editing a document by appending the changes). Using this feature, it would be enough to embed a new table near the end of the file, which would point to the original.
It turns out that an AVI (“Audio Video Interleave”) format is a subset of the RIFF container format, which stores data in a file as a series of chunks. Ignoring some features we don’t care about here, an AVI file looks like the following:
"RIFF" size type
CID1 size1 data1
CID2 size2 data2
...
In the header, size
is the total size of the file in bytes, excluding the header, and type
is “AVI
”. The rest of the file consists of the chunks, which start with a four-byte type identifier and size, and are followed by the chunk data.
We don’t care too much about what the chunks look like, since we can just copy them from an existing video file. The only type we are particularly interested in are JUNK
chunks, which are ignored by the parser, meaning that we can use them to put arbitrary data anywhere in the file, except for the first 20 bytes (we need twelve for the file header and eight for the chunk header).
As we saw, most PDF readers will accept a document as long as the header starts within the first kilobyte. We can achieve this by inserting a JUNK
chunk at the beginning of the file, followed by the original video. The resulting file would look like the following:
<header> JUNK <len(pdf)> <pdf data> <video data>
The only thing we need to fix in the original file is the length field in the header, which needs to be increased by the size of the PDF-containing chunk.
So far, I have confirmed that the document works in Firefox, and is rejected by qpdfview, which expects the xref table near the end of the file. Although the xref issue can probably be fixed, it is not possible to create a document that’s a fully standards-conforming PDF and AVI, and polyglots like this one will always have to rely on parser leniency.
RIFF containers are flexible about embedding arbitrary data, but always have to begin with a twelve-byte header, starting with the RIFF
magic string. Similarly, a valid PDF document begins with %PDF
(and a few bytes defining the version). If a PDF reader required this magic string to actually be in the first four bytes, it would never accept a document that’s also a valid AVI.