“Hidden data in Document Formats”

In the light of the recent DOD blunder regarding redacted documents, I try to sum up, what I so far did in that area. The problem is, that all kinds of documents contain information which they should not contain. MS Word documents contain previous revisions of a document, information about Servers, Filenames and Authors. PDF files allow redacted text to reappear. JPEGs contain uncropped original images. Webpages contain local paths. etc.

I presented at various occasions on the topic so far:

Our claim to fame is that we researched the problem with JPEG Thumbnails to some degree. We – the RedTeam Pentesting group at RWTH-Aachen University – did an advisory on this issue: Advisory: JPEG EXIF information disclosure. The EXIF issue also has a CVE number: CAN-2005-0406. We probably should get a CVE number for the MS-Office issues, too. And we have a application for screening thumbnails which is somewhat fun. Some examples on what to expect can be seen here.

In the comments to blog entries regarding the Hidden Data issue besides Richard Smith’s famous WordDumper another tool came up: WordLeaker – I havn’t tried it so far. There is also a tool called revisionist by lcamtuf, which I havn’t tried so far.

Previous postings on the issue can be found at http://blogs.23.nu/disLEXia/topics/HiddenData/

Post a Comment

Your email is never published nor shared. Required fields are marked *