RNA folding at elementary step resolutionCHRISTOPH FLAMM,1WALTER FONTANA,2,3IVO L. HOFACKER,1and PETER SCHUSTER11Institut für Theoretische Chemie und Molekulare Strukturbiologie, Universität Wien, A-1090 Wien, Austria2Santa Fe Institute, Santa Fe, New Mexico 87501 USAABSTRACTWe study the stochastic folding kinetics of RNA sequences into secondary structures with a new algorithm based onthe formation, dissociation, and the shifting of individual base pairs. We discuss folding mechanisms and the cor-relation between the barrier structure of the conformational landscape and the folding kinetics for a number ofexamples based on artificial and natural sequences, including the influence of base modification in tRNAs.Keywords: conformational spaces; foldability; RNA folding kinetics; RNA secondary structureINTRODUCTIONThe conformational diversity of nucleic acids or pro-teins is delimited by the loose random coil and thecompact native state that is frequently the most stableor minimum free energy (mfe) conformation+ Let us calla specific interaction between two segments of the chaina “contact+” A random coil then is best characterized bythe absence of contacts, whereas the mfe conforma-tion maximizes their energetic contributions+ Severaldifferent types of contacts are found in three-dimensionalstructures+ Their energetics is not well understood, whichmakes the modeling of RNA folding from random coilsinto full structures too ill-defined to be tackled at present+Fortunately, for single-stranded nucleic acid mol-ecules, the simpler coarse-grained notion of secondarystructure is accessible to mathematical analysis andcomputation+ To a theorist the secondary structure is thetopology of binary contacts that arises from specific basepairing (Watson–Crick and GU; see Figure 1 and the nextsection)+ It does not refer to a two- or three-dimensionalgeometry cast in terms of distances+ Secondary struc-ture formation is driven by the stacking between con-tiguous base pairs+ However, any formation of anenergetically favorable double-stranded region impliesthe simultaneous formation of an energetically unfavor-able loop+ This frustrated energetics leads to a vast com-binatorics of stack and loop arrangements spanning theconformational repertoire of an individual RNAsequenceat the secondary structure level+The secondary structure is not only an abstract toolconvenient for theorists+ It also corresponds to an ac-tual state that provides a geometric, kinetic, andthermodynamic scaffold for tertiary structure forma-tion, and constitutes an intermediate on the foldingpath from random coil to full structure+ With risingtemperature, tertiary contacts usually disappear firstand double helices melt later (Banerjee et al+, 1993)+The free energy of secondary structure formation ac-counts for a large fraction of the free energy of fullstructure formation+ These roles put the secondarystructure in correspondence with functional proper-ties of the tertiary structure+ Consequently, selectionpressures become observable at the secondary struc-ture level in terms of evolutionarily conserved basepairs (Gutell, 1993)+ Moreover, insights into the pro-cess of secondary structure formation can be ex-tended to several types of tertiary contacts with roughlyconserved local geometries, such as non-Watson–Crick base pairs, base triplets and quartets, or end-on-end stacking of double helices+To provide a frame for our kinetic treatment of RNAfolding, we give a short account of the formal issuessurrounding conformational spaces, folding trajecto-ries, and folding paths for RNA secondary structures+We then introduce the kinetic folding algorithm as astochastic process in the conformation space of a se-quence, and discuss applications to several selectedproblems that cannot be studied adequately with thethermodynamic approach alone+Reprint request to: Christoph Flamm, Institut fürTheoretische Che-mie und Molekulare Strukturbiologie, Währingerstrasse 17, A-1090Wien, Austria; e-mail: xtof@tbi+univie+ac+at+3Present address: Institute for Advanced Study, Program in Theo-retical Biology, 310 Olden Lane, Princeton, New Jersey 08540, USA+RNA(2000),6:325–338+ Cambridge University Press+ Printed in the USA+Copyright © 2000 RNA Society+325CONFORMATION SPACESAND FOLDING PATHWAYSWe denote an RNAsequence by a stringI5 (x1x2...xn)ofnpositions over the conventional nucleotide alpha-bet,xi[ A 5 {A,U,G,C}+ (If we need to distinguishbetween sequencesIk, we use superscripts, as inxi(k),to denote theith nucleotide of sequenceIk+) The basesx1andxnare the nucleotides at the 5'end and the 3'end of the sequence, respectively+ A secondary struc-tureScan be conveniently discretized as a graph rep-resenting a pattern of contacts or base pairs (Fig+ 1)+The nodes of the graph correspond to basesxiat po-sitionsi5 1,...,n+ The set of edges consists of twodisjoint subsets+ One subset is common to all second-ary structure graphs, and represents the covalent back-bone connecting the nodesiandi1 1 fori5 1,...,n21+ The other comprises the base pairs, denoted byi{j, and constitutes the secondary structure proper+ Thebase pairs form a set P withjÞ {i2 1,i,i1 1} that mustsatisfy two conditions: (1) every edge in P connects anode to at most one other node, and (2) if bothi{jandk{lare in P, theni,k,jimpliesi,l,j+ Failure tomeet condition (2) results in pseudoknots that are con-sidered tertiary contacts+Secondary structure graphs are formal combinatorialobjects amenable to mathematical treatment+ Of par-ticular interest are secondary structures satisfying someextremal condition, such as minimizing the free energy(mfe structures)+ They can be computed by dynamicprogramming (Waterman & Smith, 1978; Nussinov &Jacobson, 1980; Zuker & Stiegler, 1981)+ We have re-cently extended the standard RNAthermodynamic fold-ing algorithm to compute all conformations within someenergy range above the mfe (Wuchty et al+, 1999)+ Thisenables us to analyze the low-energy portion of theconformational landscape of individual sequences, andto put it in correspondence with their kinetic foldingbehavior derived from a computational model that wepresent below+A sequenceIis called compatible with a secondarystructureS, whenever positions that pair in the speci-fication ofS(i{j[ P(S)) are occupied by nucleotidesthat can actually pair with one another:i{jr [xixj] [ B 5 {AU, UA,UG,GU,GC,CG}, ;i{j[ P(S).A sequenceIspecifies a set of structures with which itis compatible,S (I) 5
View Full Document