Regular Expressions in Perl Part I by Ayush Gupta Simplest Regular Expression The simplest regexp is simply a word or more generally a string of characters A regexp consisting of a word matches any string that contains that word Hello World World matches The operator associates the string with the regexp match and produces a true value if the regexp matched or false if the regexp did not match Simple Word Matching In the case in previous slide World matches the second word in Hello World so the expression is true Expressions like this are useful in conditionals if Hello World World print It matches n else print It doesn t match n The literal string in the regexp can be replaced by variable like greeting World The if statement would then be if Hello World greeting Simple Word Matching cont d If you re matching against the special default variable the part can be omitted Hello World if World print It matches n else print It doesn t match n The default delimiters for a match can be changed to arbitrary delimiters by putting an m out front Hello World m World matches delimited by Match or No Match Hello World world It doesn t match because regexps are case sensitive Hello World o W matches Hello World oW It doesn t match because of a lack of a space character Hello World World It doesn t match because there is a space at the end of the regexp but not at the end of the string Regular expressions must match a part of the string exactly in order for the statement to be true Metacharacters These characters are reserved for use in regexp notation The metacharacters are A metacharacter can be matched by putting a backslash before it 2 2 4 2 2 doesn t match is a metacharacter 2 2 4 2 2 matches treated like an ordinary The backslash character is a metacharacter itself and needs to be back slashed C WIN32 C WIN matches Escape Sequences In addition to the metacharacters there are some ASCII characters which don t have printable character equivalents Common examples are t for a tab n for a newline r for a carriage return and a for a bell 1000 t2000 m 0 t2 matches 1000 n2000 0 n20 matches 1000 t2000 000 t2 doesn t match 0 ne 000 Anchor Metacharacters and are the anchor metacharacters The anchor means match at the beginning of the string The anchor means match at the end of the string or before a newline at the end of the string housekeeper keeper matches housekeeper keeper doesn t match housekeeper keeper matches housekeeper n keeper matches When both and are used at the same time the regexp has to match both the beginning and the end of the string i e the regexp matches the whole string Character Class A character class allows a set of possible characters rather than just a single character to match at a particular point in a regexp Character classes are denoted by brackets with the set of characters to be possibly matched inside bcr at matches bat cat or rat The special character acts as a range operator within character classes like 0123456789 becomes 0 9 The special character in the first position of a character class denotes a negated character class which matches any character but those in the brackets Alternation Metacharacter Enables our regexp to be able to match different possible words or character strings To match dog or cat we form the regexp dog cat cats and dogs cat dog bird matches cat cats and dogs dog cat bird matches cat Even though dog is the first alternative in the second regexp cat is able to match earlier in the string
View Full Document
Unlocking...