Regular Expressions in Java Jan 14 2019 Regular Expressions A regular expression is a kind of pattern that can be applied to text Strings in Java A regular expression either matches the text or part of the text or it fails to match If a regular expression matches a part of the text then you can easily find out which part If a regular expression is complex then you can easily find out which parts of the regular expression match which parts of the text With this information you can readily extract parts of the text or do substitutions in the text Regular expressions are an extremely useful tool for manipulating text Regular expressions are heavily used in the automatic generation of Web pages 2 Perl and Java The Perl programming language is heavily used in server side programming because Beginning with Java 1 4 Java has a regular expression package java util regex Much server side programming is text manipulation Regular expressions are built into the syntax of Perl Java s regular expressions are almost identical to those of Perl This new capability greatly enhances Java 1 4 s text handling Regular expressions in Java 1 4 are just a normal package with no new syntax to support them Java s regular expressions are just as powerful as Perl s but Regular expressions are easier and more convenient in Perl 3 A first example The regular expression a z will match a sequence of one or more lowercase letters a z means any character from a through z inclusive means one or more Suppose we apply this pattern to the String Now is the time There are three ways we can apply this pattern To the entire string it fails to match because the string contains characters other than lowercase letters To the beginning of the string it fails to match because the string does not begin with a lowercase letter To search the string it will succeed and match ow If applied repeatedly it will find is then the then time then fail 4 Doing it in Java I First you must compile the pattern import java util regex Pattern p Pattern compile a z Next you must create a matcher for a specific piece of text by sending a message to your pattern Matcher m p matcher Now is the time Points to notice Pattern and Matcher are both in java util regex Neither Pattern nor Matcher has a public constructor you create these by using methods in the Pattern class The matcher contains information about both the pattern to use and the text to which it will be applied 5 Doing it in Java II Now that we have a matcher m m matches returns true if the pattern matches the entire text string and false otherwise m lookingAt returns true if the pattern matches at the beginning of the text string and false otherwise m find returns true if the pattern matches any part of the text string and false otherwise If called again m find will start searching from where the last match was found m find will return true for as many matches as there are in the string after that it will return false When m find returns false matcher m will be reset to the beginning of the text string and may be used again 6 Finding what was matched After a successful match m start will return the index of the first character matched After a successful match m end will return the index of the last character matched plus one If no match was attempted or if the match was unsuccessful m start and m end will throw an IllegalStateException This is a RuntimeException so you don t have to catch it It may seem strange that m end returns the index of the last character matched plus one but this is just what most String methods require For example Now is the time substring m start m end will return exactly the matched substring 7 A complete example import java util regex public class RegexTest public static void main String args String pattern a z String text Now is the time Pattern p Pattern compile pattern Matcher m p matcher text while m find System out print text substring m start m end Output ow is the time 8 Additional methods If m is a matcher then m replaceFirst replacement returns a new String where the first substring matched by the pattern has been replaced by replacement m replaceAll replacement returns a new String where every substring matched by the pattern has been replaced by replacement m find startIndex looks for the next pattern match starting at the specified index m reset resets this matcher m reset newText resets this matcher and gives it new text to examine which may be a String StringBuffer or CharBuffer 9 Some simple patterns abc exactly this sequence of three letters abc any one of the letters a b or c abc any character except one of the letters a b or c immediately within an open bracket means not but anywhere else it just means the character a z any one character from a through z inclusive a zA Z0 9 any one letter or digit 10 Sequences and alternatives If one pattern is followed by another the two patterns must match consecutively For example A Za z 0 9 will match one or more letters immediately followed by one digit The vertical bar is used to separate alternatives For example the pattern abc xyz will match either abc or xyz 11 Some predefined character classes any one character except a line terminator d a digit 0 9 D a non digit 0 9 s a whitespace character t n x0B f r S a non whitespace character s w W Notice the space Spaces are significant in regular expressions a word character a zA Z 0 9 a non word character w 12 Boundary matchers These patterns match the empty string if at the specified position the beginning of a line the end of a line b a word boundary B not a word boundary A the beginning of the input can be multiple lines Z the end of the input except for the final terminator if any z the end of the input G the end of the previous match 13 Greedy quantifiers The term greedy will be explained later Assume X represents some pattern X optional X occurs once or not at all X X occurs zero or more times X X occurs one or more times X n X occurs exactly n times X n X occurs n or more times X n m X occurs at least n but not more than m times Note that these are all postfix operators that is they come after the operand 14 Types of quantifiers A greedy quantifier will match as much as it can and back off if it needs to A reluctant quantifier will match as little as possible then take more if it needs to We ll do examples in a moment You make a quantifier …
View Full Document