DEV Community

ライフポータル
ライフポータル

Posted on • Originally published at code-izumi.com

Mastering Python Substrings: Guide to Extract, Check, Search, and Replace

When working with text data in Python, performing operations like checking for a specific word or extracting a particular segment is fundamental to almost every program.

Do you find yourself asking:

  • "How do I cut out text between two specific characters?"
  • "How can I find strings in a list that match a certain pattern?"
  • "How do I extract just the last few characters of a string?"

These common string manipulation hurdles can be solved instantly by mastering Python’s built-in features. In this guide, we’ll break down the essential techniques for Extracting, Checking, Searching, and Replacing substrings in a way that’s easy to understand for beginners and pros alike.


1. Checking if a Substring "Exists"

Checking for a keyword is one of the most frequent operations in conditional logic. Python provides the intuitive in operator and specific methods for checking the start or end of a string.

The Simple in Operator

The most Pythonic way to check if a string contains another string is using the in operator.

text = "Python is a powerful language for data analysis."
keyword = "data analysis"

if keyword in text:
    print(f"Found: '{keyword}'")
else:
    print("Not found.")
Enter fullscreen mode Exit fullscreen mode

The result is a boolean (True/False), making it perfect for if statements. To check for the absence of a word, use not in for highly readable code.

Using startswith and endswith

To check if a string matches a specific pattern at the beginning or end, these dedicated methods are extremely useful.

filename = "report_2026_analysis.csv"

# Check prefix
if filename.startswith("report"):
    print("This is a report file.")

# Check suffix
if filename.endswith(".csv"):
    print("This is a CSV format file.")
Enter fullscreen mode Exit fullscreen mode

2. Extraction: The Power of "Slicing"

Extracting a specific range of characters (e.g., from the 3rd to the 8th character) is called "Slicing."

Basic Slicing Logic

Slicing uses the format [start:end]. Remember: Python indices start at 0, and the character at the "end" index is NOT included.

s = "Programming"

# Extract from index 0 to 3 (0, 1, 2)
print("First 3 chars:", s[0:3]) # Pro

# Omit start to begin from the beginning
print("Up to index 5:", s[:5]) # Progr

# Omit end to go to the very end
print("From index 2 onwards:", s[2:]) # ogramming
Enter fullscreen mode Exit fullscreen mode

Negative Indexing (From the end)

To count from the end of the string, use negative numbers. This is perfect when the string length is dynamic.

url = "[https://example.com/item/12345](https://example.com/item/12345)"

# Extract the last 5 characters
item_id = url[-5:]
print("Item ID:", item_id) # 12345

# Remove the last character
prefix = url[:-1]
print("All but last char:", prefix)
Enter fullscreen mode Exit fullscreen mode

In Python, the last character is -1, the second to last is -2, and so on.


3. Searching: Locating Substrings

If you need to know where a word starts, you need a search method.

find vs. index

Both locate a substring, but they handle missing items differently:

  • find(): Returns the starting index if found, and -1 if not found.
  • index(): Raises a ValueError if not found.

Generally, find() is preferred for safety unless you specifically want to trigger an error.

sentence = "Python is easy to learn and has many libraries."
pos = sentence.find("easy")

if pos != -1:
    print(f"The word 'easy' starts at index {pos}.")
Enter fullscreen mode Exit fullscreen mode

Dynamic Extraction

By combining search and slicing, you can extract text up to a certain symbol, such as an email username.

email = "suzuki_ichiro@example.jp"
at_pos = email.find("@")

if at_pos != -1:
    user_name = email[:at_pos]
    print("Username:", user_name)
Enter fullscreen mode Exit fullscreen mode

4. Advanced: Regex and Replacement

Extracting "Enclosed" Text

For complex patterns like "extracting text inside brackets," use the re (regular expression) module.

import re

text = "Price: [1,200 JPY], Shipping: [500 JPY]"

# Extract everything between [ ] using non-greedy match (?)
results = re.findall(r"\[(.*?)\]", text)

for price in results:
    print("Found:", price)
Enter fullscreen mode Exit fullscreen mode

The ? in .*? is crucial. It ensures "shortest match" so that Python extracts content from each pair of brackets individually instead of merging them into one big block.

Replacing Substrings

Use the replace() method to swap parts of a string.

original = "The weather is sunny. Let's enjoy the sunny day."

# Replace "sunny" with "rainy"
new_text = original.replace("sunny", "rainy")
print(new_text)
Enter fullscreen mode Exit fullscreen mode

Conclusion

Mastering substrings turns messy text data into structured information.

  1. Use in for simple checks.
  2. Use Slicing [:] for position-based extraction.
  3. Use find() to locate starting points dynamically.
  4. Use Regex for complex "enclosed" patterns.

Originally published at: [https://code-izumi.com/python/substring/]

Top comments (0)