Regular Expressions in JavaRegular ExpressionsPerl and JavaA first exampleDoing it in Perl and RubyDoing it in Java, IDoing it in Java, IIFinding what was matchedA complete exampleAdditional methodsSome simple patternsSequences and alternativesSome predefined character classesBoundary matchersGreedy quantifiersTypes of quantifiersQuantifier examplesCapturing groupsCapturing groups in JavaExample use of capturing groupsDouble backslashesEscaping metacharactersSpacesAdditions to the String classThinking in regular expressionsThe EndJan 13, 2019Regular Expressions in Java2Regular ExpressionsA regular expression is a kind of pattern that can be applied to text (Strings, in Java)A regular expression either matches the text (or part of the text), or it fails to matchIf a regular expression matches a part of the text, then you can easily find out which partIf a regular expression is complex, then you can easily find out which parts of the regular expression match which parts of the textWith this information, you can readily extract parts of the text, or do substitutions in the textRegular expressions are an extremely useful tool for manipulating textRegular expressions are heavily used in the automatic generation of Web pages3Perl and JavaThe Perl programming language is heavily used in server-side programming, becauseMuch server-side programming is text manipulationRegular expressions are built into the syntax of PerlBeginning with Java 1.4, Java has a regular expression package, java.util.regexJava’s regular expressions are almost identical to those of PerlThis new capability greatly enhances Java 1.4’s text handlingRegular expressions in Java 1.4 are just a normal package, with no new syntax to support themJava’s regular expressions are just as powerful as Perl’s, butRegular expressions are easier and more convenient in Perl4A first exampleThe regular expression "[a-z]+" will match a sequence of one or more lowercase letters [a-z] means any character from a through z, inclusive + means “one or more”Suppose we apply this pattern to the String "Now is the time"There are three ways we can apply this pattern:To the entire string: it fails to match because the string contains characters other than lowercase lettersTo the beginning of the string: it fails to match because the string does not begin with a lowercase letterTo search the string: it will succeed and match owIf applied repeatedly, it will find is, then the, then time, then fail5Doing it in Perl and RubyIn both Perl and Ruby, a regular expression is written between forward slashes, for example, /[a-z]+/Regular expressions are values, and can be used as suchFor example, line.split(/\s+/)We can search for matches to a regular expression with the =~ operatorFor example, name = "Dave"; name =~ /[a-z]/; will find ave6Doing it in Java, IFirst, you must compile the pattern import java.util.regex.*; Pattern p = Pattern.compile("[a-z]+");Next, you must create a matcher for a specific piece of text by sending a message to your pattern Matcher m = p.matcher("Now is the time");Points to notice:Pattern and Matcher are both in java.util.regexNeither Pattern nor Matcher has a public constructor; you create these by using methods in the Pattern classThe matcher contains information about both the pattern to use and the text to which it will be applied7Doing it in Java, IINow that we have a matcher m,m.matches() returns true if the pattern matches the entire text string, and false otherwisem.lookingAt() returns true if the pattern matches at the beginning of the text string, and false otherwisem.find() returns true if the pattern matches any part of the text string, and false otherwiseIf called again, m.find() will start searching from where the last match was foundm.find() will return true for as many matches as there are in the string; after that, it will return false When m.find() returns false, matcher m will be reset to the beginning of the text string (and may be used again)8Finding what was matchedAfter a successful match, m.start() will return the index of the first character matchedAfter a successful match, m.end() will return the index of the last character matched, plus oneIf no match was attempted, or if the match was unsuccessful, m.start() and m.end() will throw an IllegalStateExceptionThis is a RuntimeException, so you don’t have to catch itIt may seem strange that m.end() returns the index of the last character matched plus one, but this is just what most String methods requireFor example, "Now is the time".substring(m.start(), m.end()) will return exactly the matched substring9A complete exampleimport java.util.regex.*; public class RegexTest { public static void main(String args[]) { String pattern = "[a-z]+"; String text = "Now is the time"; Pattern p = Pattern.compile(pattern); Matcher m = p.matcher(text); while (m.find()) { System.out.print(text.substring(m.start(), m.end()) + "*"); } }}Output: ow*is*the*time*10Additional methodsIf m is a matcher, thenm.replaceFirst(replac ement) returns a new String where the first substring matched by the pattern has been replaced by repla c e mentm.replaceAll(re placement) returns a new String where every substring matched by the pattern has been replaced by repla c e mentm.find(startIndex) looks for the next pattern match, starting at the specified indexm.reset() resets this matcherm.reset(newText) resets this matcher and gives it new text to examine (which may be a String, StringBuffer, or CharBuffer)11Some simple patternsabc exactly this sequence of three letters[abc] any one of the letters a, b, or c[^abc] any character except one of the letters a, b, or c(immediately within an open bracket, ^ means “not,” but anywhere else it just means the character ^)[a-z] any one character from a through z, inclusive[a-zA-Z0-9] any one letter or digit12Sequences and alternativesIf one pattern is followed by another, the two patterns must match consecutivelyFor example, [A-Za-z]+[0-9] will match one or more letters immediately followed by one digitThe vertical bar, |, is used to separate alternativesFor example, the pattern abc|xyz will match either abc or xyz13Some predefined character classes. any one character except a line terminator\d a digit:
View Full Document