In the light of the recent DOD blunder regarding redacted documents, I try to sum up, what I so far did in that area. The problem is, that all kinds of documents contain information which they should not contain. MS Word documents contain previous revisions of a document, information about Servers, Filenames and Authors. PDF files allow redacted text to reappear. JPEGs contain uncropped original images. Webpages contain local paths. etc.
I presented at various occasions on the topic so far:
- Far more you ever wanted to tell – hidden data in document formats, presented at Defcon 12.
- Versteckte Daten in Dokumentformaten, presented at the CCCC.
- Hidden Data in
Document Formats, presented at the Aachen Summerschool Applied IT
- Versteckte Daten in Dokumentformaten, presented during my last Forensics class.
- Hidden Data in Internet Published Documents, presented at the CCC.
Our claim to fame is that we researched the problem with JPEG Thumbnails to some degree. We – the RedTeam Pentesting group at RWTH-Aachen University – did an advisory on this issue: Advisory: JPEG EXIF information disclosure. The EXIF issue also has a CVE number: CAN-2005-0406. We probably should get a CVE number for the MS-Office issues, too. And we have a application for screening thumbnails which is somewhat fun. Some examples on what to expect can be seen here.
In the comments to blog entries regarding the Hidden Data issue besides Richard Smith’s famous WordDumper another tool came up: WordLeaker – I havn’t tried it so far. There is also a tool called revisionist by lcamtuf, which I havn’t tried so far.
Previous postings on the issue can be found at http://blogs.23.nu/disLEXia/topics/HiddenData/