DOC PREVIEW
BYU BIO 465 - Analysing Gene Expression Data

This preview shows page 1-2-3-4-5-6-39-40-41-42-43-79-80-81-82-83-84 out of 84 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 84 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Analysing Gene Expression DataClassification using Machine LearningApproachesDavid Gilbert [email protected] Research CentreDepartment of Computing ScienceUniversity of GlasgowSlides adapted from Aik Choon Tan [email protected](c) DRG / AC TAN 20052Outline• Introduction• Data, features, classifiers, Inductive learning• (Selected) Machine Learning Approaches– Decision trees– Support Vector Machines– Artificial Neural Networks– Naïve Bayes• Feature Selection• Model evaluation• Model Interpretation• WEKA(c) DRG / AC TAN 20053Cancer Classification ProblemALLacute lymphoblastic leukemia (lymphoid precursors)AML acute myeloid leukemia(myeloid precursor)(Golub et al 1999)(c) DRG / AC TAN 20054Gene Expression Profile823.09…66.2578.13Gene n……………1432.12…1246.8740.55Gene2101.54…58.79103.02Gene1Conditionm…Condition2Condition1Geneidm samplesn genes(c) DRG / AC TAN 20055A (very) BriefIntroduction to MachineLearning(c) DRG / AC TAN 20056To Learn“ … to acquire knowledge of (a subject) or skill in (an art, etc.)as a result of study, experience, or teaching… ” (OED)What is Machine Learning?“ … a computer program that can learn from experience with respect to some class of tasks and performance measure … ”(Mitchell, 1997)(c) DRG / AC TAN 20057Why is Machine Learning Important?• Some tasks cannot be defined well, except by examples (e.g.,recognizing people).• Relationships and correlations can be hidden within largeamounts of data. Machine Learning/Data Mining may be able tofind these relationships.• Human designers often produce machines that do not work aswell as desired in the environments in which they are used.• The amount of knowledge available about certain tasks might betoo large for explicit encoding by humans (e.g., medicaldiagnostics).• Environments change over time.• New knowledge about tasks is constantly being discovered byhumans. It may be difficult to continuously re-design systems “byhand”.(c) DRG / AC TAN 20058Broader context• What is learning?– Memorising?– Prediction?(c) DRG / AC TAN 20059Machine Learning ApproachBlack /WhiteBoxClassifierMachineLearningGeneExpressionProfilesPrediction(Yes/No)(c) DRG / AC TAN 200510Key Steps of Learning• learning task– what is the learning task?• data and assumptions– what data is available for the learning task?– what can we assume about the problem?• representation– how should we represent the examples to be classified• method and estimation– what are the possible hypotheses?– how do we adjust our predictions based on the feedback?• evaluation– how well are we doing?• model selection– can we rethink the approach to do even better?(c) DRG / AC TAN 200511Learning Tasks• Classification – Given positive and negativeexamples, find hypotheses that distinguish theseexamples. It can extend to multi-classclassification.• Characterisation – Given positive examples, findhypotheses that describe these examples.• Clustering – Given a set of unlabelled examples,find clusters for these examples (unsupervisedlearning)(c) DRG / AC TAN 200512Sheep…(c) DRG / AC TAN 200513Goats…(c) DRG / AC TAN 200514Classification – separating sheep from goatsBig Horn Sheep [Ovis canadensis]The Big Horn Sheep [Ovis canadensis] is a large North American species with abrown coat, which turns to bluish-grey in winter.It is so named from the size of the horns of the ram, which often measure over 1m/3.3 ft round the curve.Classification: Ovis canadensis is in family Bovidae, order Artiodactyla(c) DRG / AC TAN 200515Learning Approaches• Supervised approach – givenpredefined class of a set of positive andnegative examples, construct theclassifiers that distinguish between theclasses• Unsupervised approach – given theunassigned examples, group togetherthe examples with similar properties<x, y><x>(c) DRG / AC TAN 200516Data and assumptionsYESNOYESYESS1S2S3S4g15g14g13g12g11g10g9g8g7g6g5g4g3g2g1NOSn…S5 NO-Is this a classification problem?-How does the data/label generated?(c) DRG / AC TAN 200517RepresentationYESS1111001100101001+1S = (x,y) Where x ∈ {1, 0} (red, green)And y ∈ {-1, +1} (NO, YES)There are many ways to represent the same informationThe choice of representation may determine whether the learning taskis very easy or very difficult(c) DRG / AC TAN 200518Concept LearningGiven: a set of training examples S = {(x1,y1),…,(xm,ym)} where x is theset of instances usually in the form of tuple <x1,…,xn> and y is the classlabel, the function y = f(x) is unknown and finding f(x) represent theessence of concept learning.For a binary problem y ∈ {1,0}, the unknown function f:X→{1,0}.The learning task is to find a hypothesis h(x) = f(x) for x∈XTraining examples (x, f(x)) where:f(x) = 1 are Positive examples,f(x) = 0 are Negative examples.A machine learning task:Find hypothesis, h(x) = f(x); x∈X. (in reality, usually ML task is to approximate h(x) ≅ f(x)) H is the set of all possible hypotheses, where h:X →{1,0}Hf(x)h(x)=(c) DRG / AC TAN 200519Inductive Learning• Given a set of observed examples• Discover concepts from these examples– class formation/partition– formation of relations between objects– patterns(c) DRG / AC TAN 200520Regression• linear regression : a method of estimating theconditional expected value of one variable ygiven the values of some other variable orvariables x.• “linear” because the relation of the dependent tothe independent variables is a linear function ofsome parameters.• Regression models which are not a linearfunction of the parameters are called nonlinearregression models. A neural network is anexample of a nonlinear regression model.(c) DRG / AC TAN 200521Linear Regressiony=f2(x)y=f1(x)y=f3(x)(c) DRG / AC TAN 200522Decision Trees• Widely used - simple and practical• Quinlan - ID3 (1986), C4.5 (1993) & See5/C5 (latest)• Classification and Regression Tree (CART by Breiman et.al.,1994)• Given a set of instances (with a set of properties/attributes),the learning system constructs a tree with internal nodes asan attribute and the leaves as the classes• Supervised learning• Symbolic learning, give interpretable results(c) DRG / AC TAN 200523Information Theory - Entropy!="=ciiippSEntropy12log)(!!++!!= ppppSEntropy22loglog)(Entropy – a measurement commonly used in information theory tocharacterise the (im)purity of an arbitrary


View Full Document

BYU BIO 465 - Analysing Gene Expression Data

Documents in this Course
summary

summary

13 pages

Cancer

Cancer

8 pages

Ch1

Ch1

5 pages

GNUMap

GNUMap

20 pages

cancer

cancer

8 pages

SNPs

SNPs

22 pages

Load more
Download Analysing Gene Expression Data
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Analysing Gene Expression Data and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Analysing Gene Expression Data 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?