When working with text data in Python, performing operations like checking for a specific word or extracting a particular segment is fundamental to almost every program.
Do you find yourself asking:
- "How do I cut out text between two specific characters?"
- "How can I find strings in a list that match a certain pattern?"
- "How do I extract just the last few characters of a string?"
These common string manipulation hurdles can be solved instantly by mastering Python’s built-in features. In this guide, we’ll break down the essential techniques for Extracting, Checking, Searching, and Replacing substrings in a way that’s easy to understand for beginners and pros alike.
1. Checking if a Substring "Exists"
Checking for a keyword is one of the most frequent operations in conditional logic. Python provides the intuitive in operator and specific methods for checking the start or end of a string.
The Simple in Operator
The most Pythonic way to check if a string contains another string is using the in operator.
text = "Python is a powerful language for data analysis."
keyword = "data analysis"
if keyword in text:
print(f"Found: '{keyword}'")
else:
print("Not found.")
The result is a boolean (True/False), making it perfect for if statements. To check for the absence of a word, use not in for highly readable code.
Using startswith and endswith
To check if a string matches a specific pattern at the beginning or end, these dedicated methods are extremely useful.
filename = "report_2026_analysis.csv"
# Check prefix
if filename.startswith("report"):
print("This is a report file.")
# Check suffix
if filename.endswith(".csv"):
print("This is a CSV format file.")
2. Extraction: The Power of "Slicing"
Extracting a specific range of characters (e.g., from the 3rd to the 8th character) is called "Slicing."
Basic Slicing Logic
Slicing uses the format [start:end]. Remember: Python indices start at 0, and the character at the "end" index is NOT included.
s = "Programming"
# Extract from index 0 to 3 (0, 1, 2)
print("First 3 chars:", s[0:3]) # Pro
# Omit start to begin from the beginning
print("Up to index 5:", s[:5]) # Progr
# Omit end to go to the very end
print("From index 2 onwards:", s[2:]) # ogramming
Negative Indexing (From the end)
To count from the end of the string, use negative numbers. This is perfect when the string length is dynamic.
url = "[https://example.com/item/12345](https://example.com/item/12345)"
# Extract the last 5 characters
item_id = url[-5:]
print("Item ID:", item_id) # 12345
# Remove the last character
prefix = url[:-1]
print("All but last char:", prefix)
In Python, the last character is -1, the second to last is -2, and so on.
3. Searching: Locating Substrings
If you need to know where a word starts, you need a search method.
find vs. index
Both locate a substring, but they handle missing items differently:
-
find(): Returns the starting index if found, and -1 if not found. -
index(): Raises aValueErrorif not found.
Generally, find() is preferred for safety unless you specifically want to trigger an error.
sentence = "Python is easy to learn and has many libraries."
pos = sentence.find("easy")
if pos != -1:
print(f"The word 'easy' starts at index {pos}.")
Dynamic Extraction
By combining search and slicing, you can extract text up to a certain symbol, such as an email username.
email = "suzuki_ichiro@example.jp"
at_pos = email.find("@")
if at_pos != -1:
user_name = email[:at_pos]
print("Username:", user_name)
4. Advanced: Regex and Replacement
Extracting "Enclosed" Text
For complex patterns like "extracting text inside brackets," use the re (regular expression) module.
import re
text = "Price: [1,200 JPY], Shipping: [500 JPY]"
# Extract everything between [ ] using non-greedy match (?)
results = re.findall(r"\[(.*?)\]", text)
for price in results:
print("Found:", price)
The ? in .*? is crucial. It ensures "shortest match" so that Python extracts content from each pair of brackets individually instead of merging them into one big block.
Replacing Substrings
Use the replace() method to swap parts of a string.
original = "The weather is sunny. Let's enjoy the sunny day."
# Replace "sunny" with "rainy"
new_text = original.replace("sunny", "rainy")
print(new_text)
Conclusion
Mastering substrings turns messy text data into structured information.
- Use
infor simple checks. - Use Slicing
[:]for position-based extraction. - Use
find()to locate starting points dynamically. - Use Regex for complex "enclosed" patterns.
Originally published at: [https://code-izumi.com/python/substring/]
Top comments (0)