Chapter 1 QUESTIONED ELECTRONIC DOCUMENTS EMPIRICAL STUDIES IN AUTHORSHIP ATTRIBUTION Patrick Juola Abstract Forensic analysis of questioned electronic documents is very difficult because the nature of the documents eliminates many kinds of informative differences Recent work in authorship attribution demonstrates the practicality of analyzing documents based on authorial style but the state of the art is confusing Analyses are difficult to apply little is known about type or rate of errors and no best practices are available We present the results of some recent experiments and software development to address these issues partly through the development of a systematic testbed for multilingual multigenre authorship attribution accuracy and partly through the development and concurrent analysis of a uniform and portable software tool that applies multiple methods to analyze electronic documents for authorship based on authorial style Keywords Authorship attribution stylometrics software development text forensics 1 Introduction The forensic importance of questioned documents is well understood did Aunt Martha really write this disputed version of her will Document examiners can look at handwriting or typewriting and determine authorship with near miraculous sophistication from the dot of an i or the cross of a t Electronic documents do not contain these clues Any two flat ASCII A characters are identical How can one determine who made a defamatory but anonymous post on a blog for example Whether the authorship of a purely electronic document can 2 be demonstrated to the demanding standards of a Daubert 7 hearing is an open but important research question 2 The Problem With the advent of modern computer technology a substantial amount of writing today never involves pen ink or paper This very paper is a good example born as a PDF file the first time these words see paper is in the bound volume If my authorship of these words were challenged I have no physical

