DOC PREVIEW
DREXEL CS 265 - Regular Expressions

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Regular Expressions in Perl Part ISimplest Regular ExpressionSimple Word MatchingSimple Word Matching (cont’d)Match or No Match?MetacharactersEscape SequencesAnchor MetacharactersCharacter ClassAlternation Metacharacter “|”Regular Expressions in Perl Part IbyAyush GuptaSimplest Regular Expression•The simplest regexp is simply a word, or more generally, a string of characters.•A regexp consisting of a word matches any string that contains that word:"Hello World" =~ /World/; # matches•The operator =~ associates the string with the regexp match and produces a true value if the regexp matched, or false if the regexp did not match.Simple Word Matching•In the case in previous slide, “World” matches the second word in "Hello World", so the expression is true. Expressions like this are useful in conditionals:if ("Hello World" =~ /World/) {print "It matches\n";}else {print "It doesn't match\n";}•The literal string in the regexp can be replaced by variable, like, $greeting = "World";•The if statement would then be if ("Hello World" =~ /$greeting/)Simple Word Matching (cont’d)•If you're matching against the special default variable $_, the $_ =~ part can be omitted:$_ = "Hello World";if (/World/) {print "It matches\n";}else {print "It doesn't match\n";}•The // default delimiters for a match can be changed to arbitrary delimiters by putting an 'm' out front:"Hello World" =~ m!World!; # matches, delimited by '!'Match or No Match?•"Hello World" =~ /world/; It doesn't match because regexps are case-sensitive•"Hello World" =~ /o W/; # matches•"Hello World" =~ /oW/; It doesn't match because of a lack of a space character•"Hello World" =~ /World /; It doesn't match because there is a space at the end of the regexp, but not at the end of the string•Regular expressions must match a part of the string exactly in order for the statement to be true.Metacharacters•These characters are reserved for use in regexp notation. •The metacharacters are {}[]()^$.|*+?\•A metacharacter can be matched by putting a backslash before it:"2+2=4" =~ /2+2/; # doesn't match, + is a metacharacter"2+2=4" =~ /2\+2/; # matches, \+ treated like an ordinary +•The backslash character '\' is a metacharacter itself and needs to be back slashed: 'C:\WIN32' =~ /C:\\WIN/; # matchesEscape Sequences•In addition to the metacharacters, there are some ASCII characters which don't have printable character equivalents.•Common examples are \t for a tab, \n for a newline, \r for a carriage return and \a for a bell."1000\t2000" =~ m(0\t2) # matches"1000\n2000" =~ /0\n20/ # matches"1000\t2000" =~ /\000\t2/ # doesn't match, "0" ne "\000"Anchor Metacharacters•^ and $ are the anchor metacharacters.•The anchor ^ means match at the beginning of the string•The anchor $ means match at the end of the string, or before a newline at the end of the string."housekeeper" =~ /keeper/; # matches"housekeeper" =~ /^keeper/; # doesn't match"housekeeper" =~ /keeper$/; # matches"housekeeper\n" =~ /keeper$/; # matches•When both ^ and $ are used at the same time, the regexp has to match both the beginning and the end of the string, i.e., the regexp matches the whole string.Character Class•A character class allows a set of possible characters, rather than just a single character, to match at a particular point in a regexp.•Character classes are denoted by brackets [...], with the set of characters to be possibly matched inside./[bcr]at/; # matches 'bat, 'cat', or 'rat‘•The special character '-' acts as a range operator within character classes, like, [0123456789] becomes [0-9]•The special character ^ in the first position of a character class denotes a negated character class, which matches any character but those in the brackets.Alternation Metacharacter “|”•Enables our regexp to be able to match different possible words or character strings.•To match dog or cat, we form the regexp dog | cat. "cats and dogs" =~ /cat|dog|bird/; # matches "cat""cats and dogs" =~ /dog|cat|bird/; # matches "cat”•Even though dog is the first alternative in the second regexp, cat is able to match earlier in the


View Full Document

DREXEL CS 265 - Regular Expressions

Download Regular Expressions
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Regular Expressions and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Regular Expressions 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?