In Playwright test automation with Python, Pandas is often integrated to handle data-driven testing, process test results, manage datasets for parameterized tests, and analyze scraped or logged data. While Playwright excels at browser interactions, Pandas adds robust data manipulation capabilities—especially for reading test inputs from CSVs, aggregating results, or cleaning outputs.
1. pd.read_csv()
What it does: Loads data from a CSV file into a DataFrame.
Why it's common in Playwright automation: Essential for data-driven tests; e.g., reading test cases (URLs, credentials) from CSV for parameterized Playwright runs.
Example:
import pandas as pd
df = pd.read_csv('test_data.csv') # Load URLs for Playwright page.goto()
for url in df['url']:
page.goto(url)
Pro Tip: Use usecols=['url', 'expected_text']
to load only relevant columns for efficiency.
2. pd.DataFrame()
What it does: Creates a DataFrame from lists, dicts, or other structures.
Why it's common: Builds DataFrames from Playwright-extracted data (e.g., table scrapes or test logs) for easy manipulation.
Example:
data = {'url': ['example.com'], 'status': [200], 'title': ['Example']}
df = pd.DataFrame(data) # Store Playwright page.title() results
Pro Tip: Pass Playwright's page.query_selector_all()
results as a list of dicts.
3. df.to_csv()
What it does: Exports a DataFrame to a CSV file.
Why it's common: Saves test results, screenshots metadata, or scraped data post-Playwright execution for reporting or CI/CD integration.
Example:
df.to_csv('test_results.csv', index=False) # Export after running Playwright tests
Pro Tip: Use index=False
to avoid row indices in output files.
4. df.head()
What it does: Displays the first n rows (default 5) of a DataFrame.
Why it's common: Quick preview of loaded test data or intermediate results during script debugging.
Example:
print(df.head()) # Inspect CSV data before feeding to Playwright
Pro Tip: Pair with df.info()
for a full data summary.
5. df.loc[]
What it does: Selects rows/columns by labels or boolean conditions.
Why it's common: Filters specific test cases (e.g., by browser type) for targeted Playwright runs.
Example:
filtered = df.loc[df['browser'] == 'chromium'] # Run only Chromium tests
for _, row in filtered.iterrows():
page.goto(row['url'])
Pro Tip: Use boolean masks like df.loc[df['status'] == 'fail']
for failure analysis.
6. df.apply()
What it does: Applies a function along an axis of the DataFrame.
Why it's common: Processes scraped text or computes metrics (e.g., response times) from Playwright traces.
Example:
df['clean_price'] = df['price'].apply(lambda x: float(x.replace('$', ''))) # Clean scraped prices
Pro Tip: Use axis=1
for row-wise operations in test data transformation.
7. df.groupby()
What it does: Groups data by columns and applies aggregation.
Why it's common: Aggregates test results by category (e.g., pass/fail rates per endpoint) for reporting.
Example:
summary = df.groupby('url')['status'].agg(['count', 'mean']) # Avg success per URL
Pro Tip: Chain with .agg({'pass': 'sum', 'total': 'count'})
for custom summaries.
8. df.fillna()
What it does: Fills missing (NaN) values with a specified value or method.
Why it's common: Handles incomplete data from flaky Playwright interactions (e.g., missing titles).
Example:
df['title'] = df['title'].fillna('N/A') # Fill missing page titles
Pro Tip: Use method='ffill'
for forward-filling time-series test logs.
9. df.drop_duplicates()
What it does: Removes duplicate rows based on columns.
Why it's common: Cleans datasets for unique test scenarios, avoiding redundant Playwright executions.
Example:
df = df.drop_duplicates(subset=['url']) # Ensure unique URLs
Pro Tip: Specify keep='first'
to retain the first occurrence.
10. pd.merge()
What it does: Joins two DataFrames on a key column.
Why it's common: Combines baseline test data with Playwright results for comparison (e.g., expected vs. actual).
Example:
results = pd.merge(df_expected, df_actual, on='url', suffixes=('_exp', '_act'))
Pro Tip: Use how='inner'
for intersection-only merges in validation scripts.
Why These Functions in Playwright Contexts?
These functions shine in hybrid workflows: Playwright handles dynamic browser tasks, while Pandas manages static or semi-structured data. For instance, in data-driven pytest suites, read_csv
loads inputs, apply
processes outputs, and groupby
generates reports. They're lightweight, integrate seamlessly with pytest fixtures, and scale for CI pipelines.
To get started, install via pip install pandas playwright pytest-playwright
, then experiment in a Jupyter Notebook with a simple Playwright scrape feeding into a DataFrame.
Top comments (0)