DOC PREVIEW
DREXEL CS 265 - Regular Expressions_in_Perl_Presentation_a_c

This preview shows page 1-2-22-23 out of 23 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Regular Expressions in Perl – Part 2Grouping Things & Hierarchical MatchingContinuedSlide 4Extracting MatchesSlide 6Slide 7Slide 8Matching RepetitionsSlide 10Slide 11ContinuedSlide 13Slide 14Slide 15Slide 16Slide 17Slide 18Building a RegexpSlide 20Slide 21Slide 22Slide 23By: Andrew CoryGrouping Things &Hierarchical MatchingGrouping characters – ( and )Allows parts of a regular expression to be treated as a single unitUseful for the creation of multiple words and/or phrases with similar base characters and/or wordsEx. /house(cat|keeper)/ =~ /housecat|housekeeper/Ex. /(a|[bc])d/ =~ ‘ad’, ‘bd’, or ‘cd’Ex. /(19|20|)\d\d/ =~ matches 19xx, 20xx, or xxContinuedBacktracking: step-by-step process of trying alternatives and seeing if they match, and moving on to the next alternative if it doesn’tAny given regular expression has several paths that result in a different stringBacktracking is a trial-and-error method that goes through one character at a time.ContinuedBacktracking Example – “abcd” =~ /(af|ab)(ce|c|cd)/;1 – start with letter “a”2 – try 1st alternative3 – ‘a’ matches, but ‘f’ doesn’t match ‘b’, backtrack to ‘a’ and try 2nd alternative4 – ‘a’ and ‘b’ matches the first 2 characters, first group satisfied, next group.5 – ‘c’ matches, but ‘e’ doesn’t, backtrack to ‘c’, try 2nd alt.6 – ‘c’ matches, second group is satisfied, therefore whole expression is satisfied by “abcd”Note – 3rd alt. in the 2nd group matches too, but is irrelevant: the string already satisfied the regular expression.Extracting MatchesParentheses not only group, they also extract and separate parts of strings that match the given conditionI.e. if ($time =~ /(\d\d):(\d\d):(\d\d)/) {$hours = $1;$minutes = $2;$seconds = $3; }($hours, $minutes, $second) =($time =~ /(\d\d):(\d\d):(\d\d)/);ContinuedNested grouping in a regular expression results in more separationEx. /(ab(cd|ef)((gi)|j))/;$1 = ab $2 = cd|ef $3 = gi|j $4 = giBackreferences – related to matching variables $1, $2, etc., but can only be used inside the regular expressionUseful for repeating phrasesEx. /(\w\w\w)\1/ =~ ‘booboo’, or ‘murmur’ContinuedPositions of string portions that match the conditions are also stored in the @- and @+ arraysEx. $x = “Mmm…donut”;$x =~ /^(Mmm)\.\.\.(donut)/;Foreach $expr (1..$#-) {print “$expr: ‘${$expr}’ at ($-[$expr],$+[$expr])\n”Output:1: ‘Mmm’ at (0,3)2: ‘donut’ at (6,11)ContinuedStrings that have no groupings but are still searched for are still stored in separate variables$` is the string before the match$& is the string that matched$’ is the string after the matchEx. $x = “I like chips”;$x =~ /like/;$` = “I “ $& = “like” $’ = “ chips”Matching RepetitionsQuantifier characters ?, *, +, and {} are used to match words or syllables of any length without massive amounts of repetitionDefinitionsa? = matches ‘a’ one or zero timesa* = matches ‘a’ any number of timesa+ = matches ‘a’ one or more times (at least once)a{n,m} = matches at least n times, not more than m timesa{n, } = matches at least n or more timesa{n} = matches exactly n timesContinuedExamples/[a-z]+\s+\d*/ = a lowercase word, some space, and any number of digits (ajc 93, jgro 843986)/(\w+)\s\1/ = a doubled word of any length with a space inbetween (jon jon, hidalgo hidalgo)/y(es)?/i = ‘y’, ‘Y’, or ‘yes’ContinuedPerl will always try to match as much of a given string as possible to a regular expression so long as the regular expression holds trueI.e. the ‘?’ operator will be matched to the string with whatever precursor present, if not it stops using itEx. $x = “the cat in the hat”;$x =~ /^(.*)(at)(.*)$/;$1 = ‘the cat in the h’$2 = ‘at’$3 = ‘’ContinuedQuantifiers that grab as much of the string as possible are known as ‘maximal match’ or ‘greedy’ quantifiers4 important regular expression principlesPrinciple 1: any regexp will be matched at the earliest possible position in the stringPrinciple 2: The leftmost alternation that matches in a group will be the one used (a|b|c)Principle 3: Matching quantifiers will match as much of the string as possible while holding true to the regexpPrinciple 4: The leftmost greedy quantifier has more priority over other existing greedy quantifiersContinuedExamples$x = “The programming republic of Perl”;$x =~ /^(.+)(e|r)(.*)$/$1 = ‘The programming republic of Pe’$2 = ‘r’$3 = ‘l’$x =~ /.*(m{1,2})(.*)$/$1 = ‘m’$2 = ‘ing republic of Perl’ContinuedSometimes returning the minimal piece of a string is essential, thus, ‘minimal match’ or ‘non-greedy’ quantifiers ??, *?, +?, and {}? were created.Definitionsa?? = match ‘a’ 0 or 1 times, 0 first, then 1a*? = match ‘a’ any number of times, as few as possiblea+? = match ‘a’ 1 or more times, as few as possiblea{m,n}? = match n times, no more than m, as few as pos.a{n, }? = match n times, as few as possiblea{n}? = match n times, same thing as a{n}ContinuedExamples: same as above, different operators!$x = “The programming republic of Perl”;$x =~ /^(.+?)(e|r)(.*)$/$1 = ‘Th’$2 = ‘e’$3 = ‘ programming republic of Perl’$x =~ /.*?(m{1,2})(.*)$/$1 = ‘mm’$2 = ‘ing republic of Perl’ContinuedNote: Principle 3 (matching quantifiers) may be manipulated for non-greedy quantifiers so that the leftmost quantifier matches the least amount of the string as possibleContinuedQuantifiers are susceptible to backtrackingEx. $x = “the cat in the hat”$x =~ /^(.*)(at)(.*)$/;$1 = ‘the cat in the h’ $2 = ‘at’ $3 = ‘’1 Start with the first letter, ‘t’2 The first quantifier starts, matches whole string3 ‘a’ does not match the end of the string, backtrack once4 ‘a’ does not match the last letter ‘t’, backtrack once more5 match ‘a’, then the ‘t’6 move on to the 3rd element. Already at the end of the string, assign it as an empty stringContinuedError alert!Nested indeterminable quantifiers are dangerous thingsEx. /(a|b+)*/;In the above example, the first repetitions searches with b+ of whatever length (up to infinite), and then again searches with the *


View Full Document

DREXEL CS 265 - Regular Expressions_in_Perl_Presentation_a_c

Download Regular Expressions_in_Perl_Presentation_a_c
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Regular Expressions_in_Perl_Presentation_a_c and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Regular Expressions_in_Perl_Presentation_a_c 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?