Unformatted text preview:

REGULAR EXPRESSIONS IN PERL PART I William Fisher Regular Expressions A regular expression is a string that represents a pattern Can be used to search strings extract desired parts of a string or do a search and replace operation on a string A basic regular expression uses the operator as follows Hello World World matches This returns a true if the string contains the desired pattern and false if it does not The return is reversed if the is replaced with an Hello World World does not match Regular Expressions A variable can also be used in a regular expression With the default variable can be omitted greeting World Hello World greeting matches Hello World if World The default delimiter can be replaced by putting an m in front of the expression The default delimiter can then be used as a normal character Hello World m World Regular Expressions Regular expressions match exactly to a string so they are case sensitive and consider to be a character Hello World world doesn t match Hello World oW doesn t match Hello World World doesn t match A regular expression also always matches the first instance of the pattern That hat is red hat matches hat in That Metacharacters Metacharacters can be used to make more complicated matches The meta characters are Metacharacters are treated as regular characters if preceded by a backslash The interval is 0 1 0 1 matches usr bin perl usr bin perl matches Escape Sequences Escape sequences are ASCII characters with no printable character equivilant such as n t n r a They can be included in regular expressions just like any other character 1000 n2000 0 n20 matches A backslash followed by three digits represents an octal number and a backslash followed by a lower case x x and two digits from 0 F represents a hexadecimal number cat 143 x61 x74 matches Variables in Regular Expressions Variables can be included in regular expressions similarly to how strings work with regular double quoted strings in Perl foo house housecat foo matches cathouse cat foo matches housecat foo cat matches Anchor Metacharacters The and metacharacters can be used to be used to require the expression to match at the beginning and end of a string respectively The matches even if there is a n at the end of the string housekeeper keeper doesn t match housekeeper keeper matches housekeeper n keeper matches When both are used requires the entire string matches the parameters keeper keep doesn t match keeper keeper matches matches an empty string Character Classes Character classes matches a set of possible characters which are contained within brackets bcr at matches bat cat or rat item 0123456789 matches item0 or or item9 yY eE sS match yes in a caseinsensitive way Another way to represent case insensitivity is the i operator yes i Special Characters and Range Operators in Character Classes Special characters can also be used in character classes as they are used in other places The range operator can be used to represent contiguous sets characters as ranges such as 09 or a z x bcr x at matches bat cat or rat x at matches at or xat item 0 9 matches item0 or or item9 The range operator is treated as an ordinary character if it is at the beginning or end of the character class Negation in a Character Class An at the beginning of a class means that the character can be anything but what is included in the class a at doesn t match aat or at but matches all other bat cat 0at at etc 0 9 matches a non numeric character a at matches aat or at here is ordinary Common Character Classes Certain common character classes have abbreviations d represents 0 9 s represents t r n f whitespace character w represents 0 9a zA Z represents any character except n D S and W represent the negation of the character classes of their lower case equivalents These abbreviations can be used inside or outside of character classes A period must be escaped or put in a character class to be used as a normal character Word Anchor The character b matches a boundary between a word character and a non ward character w W or W w x Housecat catenates house and cat x cat matches cat in housecat x bcat matches cat in catenates x cat b matches cat in housecat x bcat b matches cat at end of string s and m Modifiers The s modifier treats the string as a single line and therefore the character class will include n The m modifier makes the anchor metacharacters treat each line as a new string so that the match can be at the beginning or end of any line x There once was a girl nWho programmed in Perl n x girl Who s matches matches n x Who m matches Who at start of second line These modifiers can be combined sm to get both of these effects When using the m modifier A and Z can still be used to match the beginning and the end of the string ignoring the final n respectively z matches the end and considers the n Alternation Metacharacter The metacharacter can be used to match more than one possible string The order of the string still predominates cats and dogs cat dog bird matches dog cats and dogs dog cat bird matches cat In cases where more then one apply the first one is used cats c ca cat cats matches c cats cats cat ca c matches cats Source Kvale Mark Perl regular expressions tutorial 2000 http www cs drexel edu knowak cs265 fall 2009 perlretut 2007 pdf Questions


View Full Document

DREXEL CS 265 - Regular Expressions in Perl

Loading Unlocking...
Login

Join to view Regular Expressions in Perl and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Regular Expressions in Perl and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?