0x1A Great Papers in Computer Security Vitaly Shmatikov CS 380S http://www.cs.utexas.edu/~shmat/courses/cs380s/L. Zhuang, F. Zhou, D. Tygar Keyboard Acoustic Emanations Revisited (CCS 2005)Acoustic Information in Typing Different keystrokes make different sounds • Different locations on the supporting plate • Each key is slightly different Frequency information in the sound of the typed key can be used to learn which key it is • Observed by Asonov and Agrawal (2004) slide 3“Key” Observation Build acoustic model for keyboard and typist Exploit the fact that typed text is non-random (for example, English) • Limited number of words • Limited letter sequences (spelling) • Limited word sequences (grammar) This requires a language model • Statistical learning theory • Natural language processing slide 4Sound of a Keystroke Each keystroke is represented as a vector of Cepstrum features • Fourier transform of the decibel spectrum • Standard technique from speech processing slide 5 [Zhuang, Zhou, Tygar]Bi-Grams of Characters Group keystrokes into N clusters Find the best mapping from cluster labels to characters Unsupervised learning: exploit the fact that some 2-character combinations are more common • Example: “th” vs. “tj” • Hidden Markov Models (HMMs) slide 6 5 11 2 “t” “h” “e” [Zhuang, Zhou, Tygar]Add Spelling and Grammar Spelling correction Simple statistical model of English grammar • Tri-grams of words Use HMMs again to model slide 7 [Zhuang, Zhou, Tygar]Recovered Text _____ = errors in recovery = errors corrected by grammar slide 8 Before spelling and grammar correction After spelling and grammar correction [Zhuang, Zhou, Tygar]Feedback-based Training Recovered characters + language correction provide feedback for more rounds of training Output: keystroke classifier • Language-independent • Can be used to recognize random sequence of keys – For example, passwords • Representation of keystroke classifier – Neural networks, linear classification, Gaussian mixtures slide 9 [Zhuang, Zhou, Tygar]Overview Initial training Unsupervised Learning Language Model Correction Sample Collector Classifier Builder keystroke classifier recovered keystrokes Feature Extraction wave signal (recorded sound) Subsequent recognition Feature Extraction wave signal Keystroke Classifier Language Model Correction (optional) recovered keystrokes [Zhuang, Zhou, Tygar] slide 10Experiment: Single Keyboard Logitech Elite Duo wireless keyboard 4 data sets recorded in two settings: quiet and noisy • Consecutive keystrokes are clearly separable Automatically extract keystroke positions in the signal with some manual error correction [Zhuang, Zhou, Tygar] slide 11Results for a Single Keyboard slide 12 Recording length Number of words Number of keys Set 1 ~12 min ~400 ~2500 Set 2 ~27 min ~1000 ~5500 Set 3 ~22 min ~800 ~4200 Set 4 ~24 min ~700 ~4300 Set 1 (%) Set 2 (%) Set 3 (%) Set 4 (%) Word Char Word Char Word Char Word Char Initial 35 76 39 80 32 73 23 68 Final 90 96 89 96 83 95 80 92 [Zhuang, Zhou, Tygar] Datasets Initial and final recognition rateExperiment: Multiple Keyboards Keyboard 1: Dell QuietKey PS/2 • In use for about 6 months Keyboard 2: Dell QuietKey PS/2 • In use for more than 5 years Keyboard 3: Dell Wireless Keyboard • New slide 13 [Zhuang, Zhou, Tygar]Results for Multiple Keyboards 12-minute recording with app. 2300 characters Keyboard 1 (%) Keyboard 2 (%) Keyboard 3 (%) Word Char Word Char Word Char Initial 31 72 20 62 23 64 Final 82 93 82 94 75 90 [Zhuang, Zhou, Tygar] slide 14Defenses Physical security Two-factor authentication Masking noise Keyboards with uniform sound (?) slide
View Full Document