Unformatted text preview:

Fairness in AI ML Goal Understand and apply basic AI ML techniques to data scenarios with a focus on instituting fair practices when designing decision making systems based on big data Number of publications on fairness from 2011 to 2017 https towardsdatascience com a tutorial on fairness in machine learning 3ff8ba1040cb Fairness in AI ML https www cbinsights com research google amazon facebook apple hiring techlash Embedding Word NLP Word embeddings are a set of techniques in natural language processing NLP for identifying similarities between words in a corpus by using some type of model to predict the co occurence of words within a small chunk of text Word embeddings transform human language meaningfully into a numerical form This allows computers to understand the nuances implicitly encoded into our languages Male Female Verb tense Country Capital Word Embedding NLP https nypost com 2017 07 05 microsofts chatbots keep turning racist https qz com 1340990 microsofts politically correct chat bot is even worse than its racist one https arstechnica com information technology 2016 03 microsoft terminates its tay ai chatbot after she turns into a nazi https mashable com article google gmail smart compose gender bias JzgdqOqaViqu https research google com semantris Word Association Game Play Blocks Word Analogy http demos spinningbytes com doesntMatch html Write out a list of 10 occupations job classifications Enter gender two of the occupations e g woman nurse scientist man nurse scientist Repeat Document Submit Results Group assignment complete and then report back Tool traditionally only recognizes two gender classes woman man Word Similarity Relatedness Representing words has become a convenient way to compute similarities Relatedness measures the semantic similarity between words How similar is pizza to pasta How related is pizza to Italy Word Similarity Relatedness Vectorization is the process of converting text to numbers This conversation helps us to measure the similarity between words A Vector space model is an algebraic model for representing text as a vector of identifiers in which semantically similar words are mapped to proximate points in geometric space Vector Space Models One representation Document Occurrence Assign identifiers corresponding to the count of words in each document from a cluster of documents in which the word occurs Chocolate 0 1 0 1 0 0 2 0 1 0 1 0 Menu2 Menu4 Menu7 Menu9 Menu11 Vector Space Models One representation Word Context Quantify co occurrence of terms in a corpus by constructing a co occurrence matrix which captures the number of times a term appears in the context of another term Chocolate is the best dessert in the world GeorgiaTech is the best university in the world The world runs on chocolate chocolate dessert university world 1 1 0 0 1 0 1 0 0 1 2 1 1 1 0 chocolate best desert university world 0 1 1 0 2 best 1 0 1 1 1 Vector Space Models We have four tiny documents Document 1 atlanta falcons jerseys Document 2 atlanta falcons highlights Document 3 losangeles dodgers jerseys Document 4 losangeles dodgers highlights Example Document 1 Document 3 Document 2 Document 4 atlanta falcons losangeles dodgers 1 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 similar similar Document 1 atlanta falcons jerseys Document 2 atlanta falcons highlights Document 3 losangeles dodgers jerseys Document 4 losangeles dodgers highlights If we use document occurrence vectors Atlanta Falcons Jerseys Highlights Dodgers Los Angeles Atlanta Falcons Dodgers Los Angeles 0 2 0 0 2 0 0 0 0 1 1 0 0 1 1 0 0 0 0 2 0 0 2 0 If we use word context vectors similar Document 1 atlanta falcons jerseys Document 2 atlanta falcons highlights Document 3 losangeles dodgers jerseys Document 4 losangeles dodgers highlights Cosine Similarity Word Analogy We can use cosine similarity to compute the similarity between two word vectors However this notion of similarity depends on what vector representation is selected to represent the words found in your corpus Falcons similar to Dodgers Atlanta similar to Falcons Because they are both sports teams Because Atlanta Falcons Go Falcons OR Computing Similarity Relatedness Given two vectors a and b the cosine similarity is defined as the dot product of the two vectors divided by their length Vectors are quite similar to each other Vectors are not similar Vectors are similar but opposite Mathematically cosine similarity measures the cosine of the angle between two vectors projected in a multi dimensional space where the two vectors are the word vector as mentioned previously Cosine Similarity Atlanta Falcons Los Angeles Dodgers 1 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 Similarity altanta falcons 1 0 Similarity altanta los angeles 0 0 Similarity altanta dodgers 0 0 Similarity falcons los angeles 0 0 Similarity falcons dodgers 0 0 Similarity los angeles dodgers 1 0 Thus as claimed before based on cosine similarity Atlanta is similar to Falcons and Los Angeles is similar to Dodgers Cosine Similarity evaluating context based word vectors Word analogy problems have become one of the standard tools for The task consists of questions like a is to b as c is to man is to woman as king is to dog is to cat as bark is to Atlanta is to Georgia as Los Angeles is to To solve the analogy problem we need to find the word vector that is most similar to the result vector of c b a king woman man X Word Analogy Task http bionlp www utu fi wv demo Select English GoogleNews Model Navigate to Word analogy section Results queen Kremlin man is to woman as king is to China is to Beijing as Russia is to china is to beijing as russia is to Apple is to Jobs as Microsoft is to student is to teacher as doctor is to student is to professor as doctor is to USA or Europe Ballmer dentist or nurse neurologist or cardiologist Word Analogy Exercise Word Embeddings word2vec Stores each word as a point in space where it is represented by a vector of a fixed number of dimensions generally 300 Unsupervised built just by reading huge corpus of data For example Chocolate might be represented as 1 0 1 1 0 2 As discussed before dimensions are projections along different axes Word Embeddings King Man Woman Queen Good Awesome Bad Worst Similar words have same angles Can thus learn analogies vector Queen vector King vector Man vector Woman King Man Queen Woman Examples Vector Space Models One representation Predict the context of a given word by learning probabilities of co occurence from a corpus e g skip gram


View Full Document

GT CS 6603 - Fairness in AI/ML

Documents in this Course
Load more
Loading Unlocking...
Login

Join to view Fairness in AI/ML and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Fairness in AI/ML and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?