Stanford CS 224n - Gender Classification of Japanese Authors

Unformatted text preview:

Gender Classification of Japanese AuthorsGendered Speech in JapaneseCorporaOur Baseline - The “Boku” TestClassifiers UsedChasen: Segmenter and POS-taggerFeaturesSlide 8PowerPoint PresentationSVM PerformanceConclusionGender Classificationof Japanese AuthorsDavid Edwards & Cybelle SmithGendered Speech in JapaneseGender of speaker may be overtly marked: Gender-specific first-person pronouns  僕 ,boku, male; 僕 , ore, male; 僕 ,watashi, female or neutralQuestion: Does gender have less-overt effects on Japanese texts as well?Can word choice, morphology, writing style indicate gender, even in noisy environments like fiction writing?Corpora“Peace” Corpus•29 personal essays by middle school students•Topic: “Peace”•29 authors:–22 female–7 male“Bookstudio” Corpus•485 installments of online novels•Genre: Fantasy•40 authors–20 female–20 male•Also collected ~181 installments from authors of unknown gender (for future research)Our Baseline - The “Boku” TestCorpus Male AccuracyFemale AccuracyOverall AccuracyPeace .71 1.0 .93Bookstudio .91 .43 .67Classifiers UsedNaïve Bayes: Build conditional probabilities of features given gender Calculate probability of test data given a particular gender Select highest-probability genderSVM: Used the LIBSVM free classifying tool Find dividing hyperplane in num-feature dimensional space - Requires problem-specific parameters chosen via cross-validation Apply hyperplane to test dataAlso attempted Logistic RegressionChasen: Segmenter and POS-taggerStem Pronun Lemma Part of Speech -ciation 僕僕 - 僕僕僕 僕僕僕 僕 僕僕 - 僕僕僕 僕 僕 僕僕 - 僕僕僕 - 僕僕僕僕 僕僕僕僕 僕僕 僕僕 - 僕僕僕僕僕 僕 僕僕 僕僕僕 - 僕僕 僕僕僕僕僕僕僕僕 僕僕僕僕僕僕 僕僕 僕僕 僕僕 - 僕僕 - 僕僕僕 僕 僕 僕僕僕 僕僕僕僕僕僕僕僕僕僕 僕僕僕 僕僕 僕僕僕 - 僕僕僕 僕僕 僕 僕僕 - 僕僕FeaturesStem Pron Lemma POS僕僕 僕僕僕 僕僕 僕僕僕 - 僕僕KURAki kuraki KURAi adjective - independentFeatures僕僕僕僕僕僕僕Kanji (Chinese character)Hiragana (phonetic)Katakana (phonetic, like italics)Feature Indic Stem Lem Pron POS Quot WS SPDWS1SPDWS2Male Accuracy .29 .67 .68 .70 .80 .23 .66 .49 .87Female Accuracy .51 .77 .78 .74 .45 .33 .85 .81 .68Overall Accuracy .40 .72 .73 .72 .63 .28 .76 .66 .77Single-feature performance on Naive-Bayes:Trial Stem Lem Pron POS Quot WS SPDWS1SPDWS2MaleAcc.FemaleAcc.OverallAcc.1 X X .63 .73 .682 X X .81 .73 .773 X X .70 .76 .734 X X .68 .76 .725 X X .68 .78 .736 X X X X X X X X .70 .70 .707 X X X X X X X .70 .73 .71Multi-feature performance on Naive-Bayes:SVM Performance•Optimizations: –Scaling counts to avoid swamping low-frequency features –Selecting optimal error rate and kernel parametersAccuracyFeatures No ScalingScaling Cross Validation (Training Set)Cross Validation (Test Set)All features (except quotations)50.6% 48.5% 79.7% 50.0%Part of Speech50.9% 53.0% 68.0% 47.3%Wordshape 50.6% 63.3% 75.2% 50.6%Pronunciation 50.6% 64% 77.8% 51.8%Conclusion•Without considering gendered pronouns, we achieved similar performance•Most-indicative feature: wordshape (use of kanji vs. hiragana vs. katakana etc.), especially where multiple options exist•Point of interest: male and female Japanese authors differ not just in the words they use, but how they choose to write those


View Full Document
Download Gender Classification of Japanese Authors
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Gender Classification of Japanese Authors and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Gender Classification of Japanese Authors 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?