This post was last edited by woaidownload on 2024-5-10 09:38
Chapter 7 Pattern Matching and Regular Expressions in Python Programming Quick Start to Automate Tedious Work 2nd Edition
This chapter is quite long. The author elaborates on the methods and techniques of matching and searching in strings through regular expressions, which is of great help for the use of Regex.
exercise
- What is the function that creates a Regex object?
Answer:
Create a Regex object through re.compile().
- Why are raw strings often used when creating Regex objects?
Answer: Raw strings can simplify the use of escape
characters
in regular expressions .
- What does the search() method return?
Answer:
The search() method returns a Match object.
- How do you get the actual string that matches the pattern through the Match object?
Answer:
The Match object has a group() method that returns the actual matching text in the search string.
- In the regular expression created with r'(\d\d\d)-(\d\d\d-\d\d\d\d)', what does group 0 represent? What does group 1 represent? What does group 2 represent?
Answer:
Group 0 returns the entire matched text, group 1 returns the text matched by the first bracket, and group 2 returns the text matched by the second bracket.
- Brackets and periods have special meanings in regular expression syntax. How do I specify a regular expression to match real brackets and period characters?
Answer:
In regular expressions, brackets represent grouping. Periods are wildcard characters that match all characters except newline characters. To match real brackets and periods, you need to use backslash escapes, \( \) \.
- The findall() method returns a list of strings or a list of tuples of strings. What determines which kind of return it provides?
Answer:
If there is no grouping in the regular expression, the findall() method returns a list of strings. If there is a grouping in the regular expression, the findall() method returns a list of tuples of strings.
- In a regular expression, what does the | character mean?
Answer:
In a regular expression, the “|” symbol is called a “pipeline” and can match one of many expressions.
- In regular expressions, what are the two meanings of the ? character?
Answer:
In regular expressions, the ? character declares a non-greedy match or indicates an optional grouping.
- In regular expressions, what is the difference between the + and * characters?
Answer:
In regular expressions, * means "match zero or more times", and + (plus sign) means "match one or more times".
- In a regular expression, what is the difference between {3} and {3,5}?
Answer:
In a regular expression, {3} means to match 3 times, and {3,5} means to match any of 3, 4, and 5 times.
- What do the \d, \w, and \s abbreviation character classes mean in regular expressions?
Answer:
In regular expressions, \d represents any number from 0 to 9; \w represents any letter, number, or underscore character (which can be considered to match a "word" character); and \s represents a space, tab, or newline character (which can be considered to match a "blank" character).
- What do the \D, \W, and \S abbreviation character classes mean in regular expressions?
Answer:
In regular expressions, \D means any character except digits 0 to 9; \W means any character except letters, digits, and underscores; \S means any character except spaces, tabs, and newlines.
14. What is the difference between .* and *??
Answer:
.* means any text; *? means a non-greedy pattern that matches 0 or more times.
- What is the character class syntax that matches all numbers and lowercase letters?
Answer:
[0-9a-z]
- How do I make a regular expression case-insensitive?
Answer:
You can pass re.IGNORECASE or re.I as the second parameter to re.compile().
- What does the character . usually match? If re.DOTALL is passed as the second argument to re.compile(), what will it match?
Answer:
The "." character matches all characters except newline. If re.DOTALL is passed as the second argument to re.comple(), it will match all characters including newline.
- If numRegex = re.compile(r'\d+'), what does numRegex.sub('X', '12, drummers, 11 pipers, five rings, 3 hens') return?
Answer:
The string "X, drummers, X pipers, five rings, X hens"
- What does passing re.VERBOSE as the second argument to re.compile() allow you to do?
answer:
When matching complex text patterns, long, convoluted regular expressions may be required. Passing re.VERBOSE as the second argument to re.compile() will cause whitespace and comments in the regular expression string to be ignored, thus alleviating this problem.
20. Write a regular expression to match numbers with a comma every 3 digits. It must match the following numbers:
But it will not match the following numbers:
- '12,34,567' (only two digits between the commas)
- '1234' (missing comma)
answer:
re.compile(r'''
(?<![\d|,])\d{1,3}(?=\s) #Match only 1-3 digits, left
|
(?<![\d|,])\d{1,3}(?:,\d{3})+(?=\s) #Match with ","
'', re.VERBOSE)
21. Write a regular expression to match the complete name of Nakamoto. You can assume that the name always appears in
Before the last name, there is a word that starts with a capital letter. This regular expression must match:
- 'Satoshi Nakamoto'
- 'Alice Nakamoto'
- 'RoboCop Nakamoto'
But it does not match:
- 'satoshi Nakamoto' (name without capital letter)
- 'Mr. Nakamoto' (the preceding word contains non-alphabetic characters)
- 'Nakamoto' (no name)
- 'Satoshi nakamoto' (last name without capital letter)
answer:
re.compile(r'[AZ][a-zA-Z]*\sNakamoto')
22. Write a regular expression to match a sentence whose first word is Alice, Bob, or Carol.
The second word is eats, pets, or throws, and the third word is apples, cats, or baseballs. The sentence ends with a period. This regular expression is not case sensitive. It must match:
- 'Alice eats apples.'
- 'Bob pets cats.'
- 'Carol throws baseballs.'
- 'Alice throws Apples.'
- 'BOB EATS CATS.'
But it does not match:
- 'RoboCop eats apples.'
- 'ALICE THROWS FOOTBALLS.'
- 'Carol eats 7 cats.'
answer:
re.compile(r'''((?:Alice|Bob|Carol)\s(?:eats|pets|throws)\s(?:apples|cats|baseballs)\.)''', re.VERBOSE | re.I)