You have a spreadsheet of job postings and you need to filter it down to roles that are remote-friendly, senior-level, and have a disclosed salary. Sounds straight-forward except the data looks like this:
| company | post |
|---|---|
| Airtable | Async-first team, 8+ yrs exp, $185-220K base |
| Vercel | Lead our NYC team. Competitive comp, DOE |
| Notion | In-office SF. Staff eng, $200K + equity |
| Linear | Bootcamp grads welcome! $85K, remote-friendly |
| Descript | Work from anywhere. Principal architect, $250K |
Now try writing deterministic rules for that.
- "Remote-friendly" could be "remote", "work from anywhere", "async-first", or implied by the absence of an office mention.
- "Senior-level" might be "8+ yrs", "Staff", "Principal", or "Lead" — but "Lead" could also be a junior team lead.
- "Salary disclosed" means actual numbers, not "Competitive comp" or "DOE."
What if you could just describe what you want?
everyrow lets you define fuzzy, qualitative logic in natural language and apply it to every row of a dataframe. The SDK handles LLM orchestration, structured outputs, and scaling with the user specifying judgment criteria in plain English.
Here's the job screening example:
import asyncio
import pandas as pd
from pydantic import BaseModel, Field
from everyrow.ops import screen
jobs = pd.DataFrame([
{"company": "Airtable", "post": "Async-first team, 8+ yrs exp, $185-220K base"},
{"company": "Vercel", "post": "Lead our NYC team. Competitive comp, DOE"},
{"company": "Notion", "post": "In-office SF. Staff eng, $200K + equity"},
{"company": "Linear", "post": "Bootcamp grads welcome! $85K, remote-friendly"},
{"company": "Descript", "post": "Work from anywhere. Principal architect, $250K"},
])
class JobScreenResult(BaseModel):
qualifies: bool = Field(description="True if meets ALL criteria")
async def main():
result = await screen(
task="""
Qualifies if ALL THREE are met:
1. Remote-friendly
2. Senior-level (5+ yrs exp OR Senior/Staff/Principal in title)
3. Salary disclosed (specific numbers, not "competitive" or "DOE")
""",
input=jobs,
response_model=JobScreenResult,
)
print(result.data)
asyncio.run(main())
That's it. No regex, no threshold tuning, no parsing logic. The screen operation evaluates every row against your natural-language criteria using an LLM and returns structured results via a Pydantic model.
The output:
| company | qualifies |
|---|---|
| Airtable | True |
| Vercel | False |
| Notion | False |
| Linear | False |
| Descript | True |
- Airtable qualifies: async-first (remote-friendly), 8+ years (senior), $185-220K (salary disclosed).
- Descript qualifies: work from anywhere (remote), principal architect (senior), $250K (salary disclosed).
- The rest fail on at least one criterion: Vercel has no real salary, Notion is in-office, Linear isn't senior-level.
Sessions: Track Everything in a Dashboard
Every operation runs within a grouping of related operations that appears in the everyrow.io web UI. These sessions are created automatically, but for multi-step pipelines you'll want to create one explicitly:
from everyrow import create_session
from everyrow.ops import screen, rank
async with create_session(name="Lead Qualification") as session:
print(f"View at: {session.get_url()}")
screened = await screen(
session=session,
task="Has a company email domain (not gmail, yahoo, etc.)",
input=leads,
response_model=ScreenResult,
)
ranked = await rank(
session=session,
task="Score by likelihood to convert",
input=screened.data,
field_name="conversion_score",
)
The session URL gives you a live dashboard where you can monitor progress and inspect results while your script runs.
Background Jobs for Large Datasets
All the operations above are already async/await. The _async variants are different — they're fire-and-forget: they submit work to the server and return immediately so your script can continue:
from everyrow.ops import screen_async
async with create_session(name="Background Screening") as session:
task = await screen_async(
session=session,
task="Remote-friendly, senior-level, salary disclosed",
input=large_dataframe,
)
print(f"Task ID: {task.task_id}")
# do other work...
result = await task.await_result()
If your script crashes, recover the result later using the task ID:
from everyrow import fetch_task_data
df = await fetch_task_data("12345678-1234-1234-1234-123456789abc")
Beyond Screening: What Else Can You Do?
screen is just one of several operations:
| Operation | What it does |
|---|---|
| Screen | Filter rows by criteria that require judgment |
| Rank | Score rows by qualitative factors |
| Dedupe | Deduplicate when fuzzy string matching isn't enough |
| Merge | Join tables when keys don't match exactly |
| Research | Run web agents to research each row |
Each operation takes a natural-language task description and a dataframe, and returns structured results. Same pattern, different capability.
When to Use This (and When Not To)
everyrow is designed for cases where the logic is easy to describe but hard to code: screening, ranking, deduplication, and enrichment tasks where the criteria require judgment.
It's not a replacement for deterministic transformations. If you can write a reliable df[df["salary"] > 100000], you should. Use everyrow for the columns where the values are natural language, inconsistent, or require world knowledge to interpret.
The tradeoff is latency and cost: LLM-based operations are slower and not free. For the job screening example above, processing 5 rows takes a few seconds and costs a fraction of a cent. For 10,000 rows, you'd want the async variants and should expect minutes rather than milliseconds. The docs cover scaling patterns for larger datasets.
Get Started
pip install everyrow
export EVERYROW_API_KEY=your_key_here
Get a free API key at everyrow.io/api-key - comes with $20 free credit.
Full docs and more examples: everyrow.io/docs/getting-started
Resources
Top comments (0)