Unformatted text preview:

Automating Document ReviewDocument ReviewClassification ProblemFeature Selection / DataResultsAutomating Document ReviewNathaniel LoveCS 244n Final Project Presentation6/14/2006Nathaniel LoveDocument Review•Litigation cases, government investigations•Discovery process: Company involved in case is compelled to produce documents (internal memos, financial statements, email) in response to a discovery request.•Company doesn’t want to release everything, only those documents that are•Responsive to the discovery request, and•Not privileged, meaning subject to protection under attorney-client privilege.•Company’s attorney must review all documents before they are produced. •In a large litigation case, this may be ~500,000 documents.Nathaniel LoveClassification Problem•500,000 emails to review•Inspection by attorneys at ~100/hr, $275/hr•$1.375 million to pay for document review for 1 case•Improving this process•Each email must be classified as•Responsive / non-responsive•Privileged / non-privileged•As attorneys review, train 2 MaxEnt classifiers•Organize documents classified by partially trained classifiers.•Present sorted documents to attorneys, with suggested classifications.•Run trained classifier on all previously reviewed documents to check errors.Nathaniel LoveFeature Selection / Data•Emails: sender, recipient, date, words/word pairs in subject, presence/type of attachments…•Hand-built features: added based on concepts relevant to discovery request•Enron Corpus: solid match for data seen in actual document review process.•Test and training data drawn from hand-tagged Enron emails (work done by Berkeley group).•Mapped Berkeley categories into responsive/privileged categories based on FERC investigation into Enron (concerning manipulation of energy markets in western U.S.)•Issues•Small data set overall (1700 documents tagged out of over 600,000 in corpus)•Poor data for privilege classifier: tagged documents contain many fewer privileged emails than exist in the corpus overallNathaniel LoveResults•Accuracy:•75% (responsive)•93% (privileged)•Accuracy improvedwith more training.•Positive feedback from attorneys on use of system, especially on the organization and presentation of documents by classifier as it trains.•Weights on features (responsive classifier)•[email protected] (high positive weight)•[email protected] (high negative weight) •David Parquet was Enron’s Vice President for project development in the western U.S. •Nicholas O’Day was Vice President at Enron


View Full Document
Download Automating Document Review
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Automating Document Review and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Automating Document Review 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?