I run a small portfolio of brick-and-mortar businesses. Every time I evaluate a new location — gas station, auto shop, retail space — I'd spend days pulling data from Google Maps, census reports, traffic studies, and competitor lists. Then I'd dump it all into a spreadsheet and try to make a decision.
It was slow, inconsistent, and subjective. So I built an algorithm to do it.
The Problem: Site Selection is Broken
Most small business owners pick locations based on gut feeling, a broker's pitch, or "it felt right when I drove by." The data exists to make better decisions, but it's scattered across dozens of sources and there's no standardized way to score it.
Enterprise players use tools like Placer.ai ($500+/mo) or hire consultants ($5K+ per report). Small operators get nothing.
I wanted something that could score any address on a standardized scale — like a credit score, but for business locations.
The 10-Factor Algorithm
After analyzing what actually correlates with business success at physical locations, I landed on 10 dimensions:
| # | Factor | Weight | What It Measures |
|---|---|---|---|
| 1 | Traffic Volume | 15% | Daily vehicle/foot count from DOT + Google Popular Times |
| 2 | Demographics | 12% | Median income, population density, age distribution |
| 3 | Competition Density | 12% | Same-category businesses within radius |
| 4 | Accessibility | 10% | Road type, intersection quality, public transit |
| 5 | Visibility | 10% | Frontage, signage potential, setback distance |
| 6 | Parking & Layout | 8% | Lot size, spaces per sqft, flow |
| 7 | Anchor Proximity | 8% | Distance to Walmart, Target, major chains |
| 8 | Growth Trajectory | 8% | 5-year population trend, permit activity |
| 9 | Market Saturation | 9% | Revenue-per-capita vs. business count |
| 10 | Risk Factors | 8% | Crime rate, flood zone, vacancy rate |
Total: 100% weighted score.
The Grading System
Raw scores get mapped to letter grades using distribution-based thresholds:
function calculateGrade(score, population) {
const percentile = getPercentileRank(score, population);
if (percentile < 40) return { grade: 'NR', label: 'Not Rated' };
if (percentile < 55) return { grade: 'C', label: "That Ain't It" };
if (percentile < 70) return { grade: 'B', label: 'Good but not Great' };
if (percentile < 85) return { grade: 'A', label: 'Pull the Trigger' };
if (percentile < 95) return { grade: 'A+', label: 'Unicorn' };
return { grade: 'AAA', label: 'Great White Buffalo' };
}
The labels are intentionally blunt:
- C = "That Ain't It" — meets minimum criteria but has significant drawbacks
- B = "Good but not Great" — solid fundamentals, notable limitations
- A = "Pull the Trigger" — strong across all dimensions
- A+ = "Unicorn" — exceptional, rare find
- AAA = "Great White Buffalo" — statistical anomaly, once in a lifetime
Self-Improving Thresholds
Here's the part I'm most proud of. The algorithm never stops learning.
Every scored location feeds back into the dataset. As more locations get analyzed, the averages recalibrate, thresholds shift, and grades become more accurate.
async function recalibrateThresholds(env) {
const allScores = await env.SCORES_DB.list();
const dataset = allScores.map(s => JSON.parse(s.value));
// Recalculate population statistics
const stats = calculateDistribution(dataset);
// Update grade boundaries based on actual distribution
await env.THRESHOLDS.put('current', JSON.stringify({
updated: new Date().toISOString(),
sample_size: dataset.length,
mean: stats.mean,
stddev: stats.stddev,
percentiles: stats.percentiles,
}));
}
With 10 scored locations, the grades are rough estimates. With 1,000, they're statistically meaningful. With 10,000, they're approaching ground truth.
Data Sources (All Free or Cheap)
One design constraint: keep data costs near zero.
| Source | Data | Cost |
|---|---|---|
| US Census API | Demographics, income, population | Free |
| Google Places API | Competitor count, ratings, categories | $0.032/call |
| Google Maps Platform | Geocoding, distance matrix | Free tier generous |
| DOT Traffic Data | AADT counts by road segment | Free (public data) |
| FEMA Flood Maps | Flood zone classification | Free |
| FBI UCR | Crime statistics by area | Free |
| Building Permits API | Construction/growth activity | Free (varies by county) |
Total cost per location analysis: roughly $0.15 - $0.30.
Compare that to $500+/mo for Placer.ai or $5K for a consultant report.
The Architecture
The whole thing runs on Cloudflare Workers (serverless, $0/mo hosting):
User enters address
→ Geocode via Google Maps
→ Fan out to 10 data sources in parallel
→ Normalize each factor to 0-100
→ Apply weights
→ Calculate composite score
→ Map to letter grade
→ Store result (feeds back into thresholds)
→ Return report
Processing time: ~3 seconds per address.
Context Matters
A raw score isn't enough. The algorithm factors in context:
- Time of year: Seasonal businesses score differently in summer vs. winter
- Market conditions: Vacancy rates shift what "good" looks like
- Location-specific factors: A gas station at a highway exit is evaluated differently than one in a residential neighborhood
- Competition quality: 5 competitors with 2-star reviews is different from 5 with 4.8-star reviews
Real Example
I ran the algorithm on a gas station I was evaluating in suburban Minnesota:
Traffic Volume: 82/100 (28,000 AADT on adjacent road)
Demographics: 74/100 (median income $78K, growing area)
Competition: 61/100 (4 stations within 2 miles)
Accessibility: 88/100 (signalized intersection, easy in/out)
Visibility: 79/100 (corner lot, good frontage)
Parking: 70/100 (adequate but tight)
Anchor Proximity: 85/100 (Walmart 0.3 mi, Target 0.8 mi)
Growth Trajectory: 77/100 (12% population growth in 5yr)
Market Saturation: 58/100 (slightly oversaturated)
Risk Factors: 81/100 (low crime, no flood zone)
Composite Score: 75.4
Grade: A — "Pull the Trigger"
I bought it. It's been profitable since month one.
What I Learned Building This
Weighted scoring beats gut feeling every time. Even rough data, systematically analyzed, outperforms intuition.
The grade scale matters more than the score. Nobody remembers "75.4" but everyone remembers "A — Pull the Trigger."
Self-improving algorithms are worth the extra complexity. The recalibration loop is the difference between a static spreadsheet and a living system.
Free data is surprisingly good. Census, DOT, FEMA — there's more free public data than most people realize.
Try It
I packaged this into a tool called SiteSweep — $79 one-time purchase, no subscription. Enter an address, get a full 10-factor report with a letter grade in seconds.
It's built for small business owners, franchise operators, and commercial real estate investors who want data-driven site selection without enterprise pricing.
The whole stack runs on Cloudflare Workers with zero hosting costs. Deep dive on the architecture: How We Run SaaS With Zero Hosting Costs.
What factors would you add to the scoring algorithm? I'm always looking to improve the model. Drop your thoughts in the comments.
Top comments (0)