Demystifying Regular Expressions in Python with Examples
Regex (short for regular expressions) in Python is a powerful and versatile tool for pattern matching and text manipulation. It allows you to search, extract, and manipulate strings based on specific patterns or regular expressions. Regular expressions are a sequence of characters that define a search pattern. They are a standard feature in many programming languages, including Python.
Here's a detailed explanation of regex in Python with examples:
### Basic Syntax:
In Python, you can work with regular expressions using the `re` module, which provides functions for working with regular expressions.
1. **Import the `re` module:**
import re
2. **Create a regex pattern:**
To define a regex pattern, you use a string containing special characters that represent various matching rules. For example, the dot (`.`) matches any character, and the asterisk (`*`) matches zero or more occurrences of the preceding character.
### Matching:
You can use regular expressions to find patterns within strings.
import re
text = "Hello, my email is john@example.com"
pattern = r"\b\w+@\w+\.\w+\b" # Matches email addresses
match = re.search(pattern, text)
if match:
print("Found:", match.group())
else:
print("No match found")
In this example:
- `\b` represents a word boundary.
- `\w+` matches one or more word characters (letters, digits, or underscores).
- `@` matches the "@" symbol.
- `\.` matches a period (escaped because it's a special character).
- `re.search()` searches the `text` for the first occurrence of the pattern.
### Extracting:
You can extract matched portions of a string using groups.
import re
text = "Date of birth: 2023-10-02"
pattern = r"(\d{4})-(\d{2})-(\d{2})" # Matches date in yyyy-mm-dd format
match = re.search(pattern, text)
if match:
year, month, day = match.groups()
print(f"Year: {year}, Month: {month}, Day: {day}")
else:
print("No match found")
In this example, `(\d{4})` captures the year, `(\d{2})` captures the month, and `(\d{2})` captures the day.
### Replacing:
You can use regular expressions to replace specific patterns in a string.
import re
text = "Hello, my name is John. Hello, my name is Alice."
pattern = r"Hello, my name is (\w+)."
new_text = re.sub(pattern, "Hi, my name is \\1.", text)
print(new_text)
In this example, we use `re.sub()` to replace the name with "Hi, my name is {name}".
### Flags:
You can use flags to modify regex behavior. Common flags include `re.IGNORECASE` (case-insensitive) and `re.MULTILINE` (multiline matching).
import re
text = "The quick brown fox\nJumps over the lazy dog"
pattern = r"the (.+?) fox"
match = re.search(pattern, text, re.IGNORECASE | re.MULTILINE)
if match:
print("Found:", match.group(1))
In this example, we use the `re.IGNORECASE` flag to make the pattern case-insensitive and the `re.MULTILINE` flag to match across multiple lines.
Comments
Post a Comment