Module 2: Intro to RegEx
Module Overview
Learn about Regular Expressions (RegEx) in Java and how to use them for pattern matching, validation, and text processing. Regular expressions provide a powerful way to search, extract, and manipulate text based on patterns.
Learning Objectives
- Reading data from files into strings
- Parsing strings into classes using RegEx
- RegEx for digits
- RegEx for words
- RegEx capture groups
- RegEx named capture groups
- RegEx for whitespace
- RegEx quantifiers
RegEx Fundamentals
Regular expressions (RegEx) are sequences of characters that define a search pattern. They are used for pattern matching within text and are supported in most programming languages, including Java.
Common RegEx Patterns
Here are some common patterns used in regular expressions:
\d
- Matches any digit (0-9)\w
- Matches any word character (alphanumeric plus underscore)\s
- Matches any whitespace character (spaces, tabs, line breaks).
- Matches any character except newline*
- Matches 0 or more of the preceding element+
- Matches 1 or more of the preceding element?
- Matches 0 or 1 of the preceding element{n}
- Matches exactly n occurrences of the preceding element{n,}
- Matches n or more occurrences of the preceding element{n,m}
- Matches between n and m occurrences of the preceding element
Capture Groups
Capture groups allow you to extract parts of the matched text:
(pattern)
- Creates a capture group with the matched pattern(?<name>pattern)
- Creates a named capture group
Example Implementation
Here's how you might use RegEx in Java for various common tasks:
// Simple pattern matching String text = "Java Regular Expressions"; boolean matches = text.matches(".*Regular.*"); // true // Finding all matches in a string String text = "Contact us at support@example.com or sales@example.com"; Pattern pattern = Pattern.compile("[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"); Matcher matcher = pattern.matcher(text); while (matcher.find()) { System.out.println("Found email: " + matcher.group()); } // Using capture groups to extract data String dateString = "Today is 2023-05-15"; Pattern pattern = Pattern.compile("(\\d{4})-(\\d{2})-(\\d{2})"); Matcher matcher = pattern.matcher(dateString); if (matcher.find()) { String year = matcher.group(1); // 2023 String month = matcher.group(2); // 05 String day = matcher.group(3); // 15 System.out.println("Year: " + year + ", Month: " + month + ", Day: " + day); } // Using named capture groups String nameString = "John Doe"; Pattern pattern = Pattern.compile("(?<firstName>\\w+)\\s(?<lastName>\\w+)"); Matcher matcher = pattern.matcher(nameString); if (matcher.find()) { String firstName = matcher.group("firstName"); // John String lastName = matcher.group("lastName"); // Doe System.out.println("First name: " + firstName + ", Last name: " + lastName); }
Mastery Task 2: Validate User Info
Mastery Task Guidelines
Mastery Tasks are opportunities to test your knowledge and understanding through code. When a mastery task is shown in a module, it means that we've covered all the concepts that you need to complete that task.
Each mastery task must pass 100% of the automated tests and code styling checks to pass each sprint. Your code must be your own. If you have any questions, feel free to reach out for support.
RegEx Validation
We have two Validator classes that should be used to enforce restrictions on usernames, passwords, and emails during User instantiation. At the moment each of these validators simply return true. Using RegEx pattern matchers, implement the following rules for each of the corresponding validation methods:
username
- Must be at least 4 characters in length.
- Must begin with an uppercase letter.
- Only allow letters and numbers
password
- Must be at least 8 characters in length.
- Must contain at least one uppercase letter.
- Must contain at least one lowercase letter.
- Must contain at least one number letter.
- May only contain letters, numbers, and the symbols !@#$%^&*.
- Must fit the standard email format {name}@{domain}.{identier}.
- The {name} field must contain at least one letter, digit, _, or .
- The {domain} and {identifier} fields must only contain letters.
- Each field must contain at least one character.
HINT: It may be helpful to break validation into multiple pattern matching steps rather than doing it all at once.
Completion
Once the UserInfoValidator and EmailValidator methods have been fully implemented the tests under the MT02_RegexValidators class should all pass:
./gradlew -q clean :test --tests 'com.bloomtech.socialfeed.MT02*'