My Research Work and ClusteringOutlineCentral Dogma of Molecular BiologyAmino Acids, the subunit of proteinsProtein Primary, Secondary, and Tertiary StructureProtein 3D StructureProtein Sequence MotifSlide 8Goal of the our groupSlide 10Experiment setup: HSSP matrix: 1b25HSSP matrix: 1b25Representation of SegmentSlide 14Clustering AlgorithmsK-means ClusteringSlide 17Slide 18Slide 19Slide 20Fuzzy C-means ClusteringSlide 22Slide 23Slide 24Slide 25Slide 26Slide 27Granular Computing ModelMotivationReduce Space-complexityReduce Time-complexityHSSP-BLOSUM62 MeasureSlide 33Future WorksPART3: protein information extraction by Decision TreePART4: Clustering with association rule and graph theoryPART4: Super rule generation by DB-ScanPART5: Protein local tertiary structure predictionMy Research Work and Clustering Dr. Bernard Chen Ph.D.University of Central ArkansasFall 2010OutlineIntroductionExperimental SetupClusteringFuture WorksCentral Dogma of Molecular BiologyAmino Acids, the subunit of proteinsProtein Primary, Secondary, and Tertiary StructureProtein 3D StructureProtein Sequence MotifAlthough there are 20 amino acids, the construction of protein primary structure is not randomly choose among those amino acidsSequence Motif: A relatively small number of functionally or structurally conserved sequence patterns that occurs repeatedly in a group of related proteins.Protein Sequence MotifThese biologically significant regions orresidues are usually:Enzyme catalytic siteProstethic group attachment sites (heme, pyridoxal-phosphate, biotin…)Amino acid involved in binding a metal ionCysteines involved in disulfide bondsRegions involved in binding a molecule (ATP/ADP, GDP/GTP, Ca, DNA…)Goal of the our groupThe main purpose is trying to obtain and extract protein sequence motifs which are universally conserved and across protein family boundaries. Discuss the relation between Protein Primary structure and Tertiary structureOutlineIntroductionExperimental SetupClusteringFuture WorksExperiment setup: HSSP matrix: 1b25HSSP matrix: 1b25Representation of SegmentSliding window size: 9Each window corresponds to a sequence segment, which is represented by a 9 × 20 matrix plus additional nine corresponding secondary structure information obtained from DSSP. More than 560,000 segments (413MB) are generated by this method. DSSP: Obtain 2nd Structure informationOutlineIntroductionExperimental SetupClusteringFuture WorksClustering AlgorithmsThere are two clustering algorithms we used in our approach:K-means ClusteringFuzzy C-means ClusteringK-means ClusteringK-means ClusteringK-means ClusteringK-means ClusteringK-means ClusteringFuzzy C-means ClusteringFuzzy C-means ClusteringFuzzy C-means ClusteringFuzzy C-means ClusteringFuzzy C-means ClusteringFuzzy C-means ClusteringFuzzy C-means ClusteringGranular Computing Model Original datasetFuzzy C-Means ClusteringInformation Granule 1Information Granule MK-means Clustering K-means Clustering Join InformationFinal Sequence Motifs Information......MotivationReduce Space-complexity Number of MembersNumber of ClustersData SizeGranule 0 136112 151 99.9MBGranule 1 68792 76 50.5MBGranule 2 86094 95 63.2MBGranule 3 65361 72 47.9MBGranule 4 63159 70 46.3MBGranule 5 120130 133 88.2MBGranule 6 128874 143 94.6MBGranule 7 4583 5 3.3MBGranule 8 43254 48 31.7MBGranule 9 5032 6 3.7MBTotal 721390 799 529MBOriginal dataset562745 800 413MBTable 1 summary of results obtained by FCMReduce Time-complexityWei’s method: 1285968 sec (15 days) * 6 = 7715568 sec (90 days) Granular Model: 154899 sec + 231720 sec * 6 = 1545219 sec (18 days) (FCM exe time) (2.7 Days)HSSP-BLOSUM62 MeasureOutlineIntroductionExperimental SetupClusteringFuture WorksPart1Bioinformatics Knowledge and Dataset CollectionPart2Discovering Protein Sequence Motifs Part3Motif Information ExtractionPart4Mining the Relations between Motifs and MotifsPart5Protein Local Tertiary Structure PredictionFutureWorksPART3: protein information extraction by Decision TreePART4: Clustering with association rule and graph theoryPART4: Super rule generation by DB-ScanApply DB scan to build up super-rules among all motifsPART5: Protein local tertiary structure prediction ByDecision TreeNaïve Bayesian Association rule algorithms and
View Full Document