2
Abstract
There are a number of pitfalls for the person attempting to sanitize a Word document for release.
This paper describes the issue, and gives a step-by-step description of how to do it with
confidence that inappropriate material will not be released.
SUMMARY
Both the Microsoft Word document format (MS Word) and Adobe Portable Document (PDF) are
complex, sophisticated computer data formats. They can contain many kinds of information
such as text, graphics, tables, images, meta-data, and more all mixed together. The complexity
makes them potential vehicles for exposing information unintentionally, especially when
downgrading or sanitizing classified materials. Although the focus is on MS Word, the general
guidance applies to other word processors and office tools, such as WordPerfect, PowerPoint,
Excel, Star Office, etc.
This document does not address all the issues that can arise when distributing or downgrading
original document formats such as MS Word or MS PowerPoint. Using original source formats,
such as MS Word, for downgrading can entail exceptional risks; the lengthy and complicated
procedures for mitigating such risks are outside the scope of this note.
DETAILS
MS Word is used throughout the DoD and the Intelligence Community (IC) for preparing
documents, reports, notes, and other formal and informal materials. Commonly used versions of
MS Word include Word 2000, Word XP, and Word 2003.
Adobe PDF is used very extensively by all parts of the U.S. Government and military services
for disseminating and distributing documents of all kinds. PDF provides excellent fidelity and
portability, and allows easy distribution of documents over computer networks and the Internet.
PDF files are usually produced using commercial conversion software (so-called “distillers”) that
accept source formats such as Postscript or MS Word, and output PDF. PDF is often used as the
format for downgraded or sanitized documents.
As numerous people have learned to their chagrin, merely converting an MS Word document to
PDF does not remove all metadata automatically. In addition, Adobe Distiller and the
PDFMaker Add-in to MS Word (the most common way to convert) convert much of the layering
complexity from one format to the next. For example, images placed on top of text in MS Word
will be copied verbatim to PDF with the same layout.
Typical Kinds of Exposures
When attempting to sanitize a document, analysts commit three common mistakes with MS
Word and PDF that lead to most cases of unintentional exposure.
1. Redaction of Text and Diagrams - Covering text, charts, tables, or diagrams with black
rectangles, or highlighting text in black, is a common and effective means of redaction
for hardcopy printed materials. It is not effective, in general, for computer documents
distributed across computer networks (i.e. in “softcopy” format). The most common
mistake is covering text with black.