Building a Rent Fairness Calculator From 10,000+ Listing Data Points

#dublin #renting #ireland #proptech

Dublin has a rental price opacity problem. Property sale prices in Ireland are public, recorded in the Property Price Register with address and amount for every transaction. Rental prices are not. The official rental registry (RTB) runs 3-6 months behind, covers only registered tenancies, and publishes aggregate figures rather than per-property data. The salary you'd need to comfortably rent in Dublin (we broke that down here) makes this opacity particularly painful.

If you're a renter in Dublin trying to answer "is what I'm paying fair," you're working from a dataset of size one: the price you were offered, compared to a handful of other listings you happened to see when you were searching.

The product feature I wanted to build was a rent fairness calculator: given a property's attributes, where does its price sit relative to the current market? That meant building the data layer first. Here's what that involved.

The data source

Rental listing prices are the closest thing to real-time rental price data that exists in Ireland. They're asking prices, not agreed prices. In a market with moderate negotiation (Irish rentals don't negotiate much; properties tend to let at asking or the landlord adjusts and re-lists), asking prices are a reasonable proxy.

I aggregate listings from 90+ sources: the main portals (Daft, Rent.ie, MyHome), letting agency sites, property management company portals. Each listing has a price, location, size, bedroom count, and various attributes.

The working dataset for price analysis covers Dublin listings over a rolling window. At any given time, the active dataset is several thousand current listings. The historical set across all sources runs into the tens of thousands of records.

The normalization problem

Raw listing data from different sources is not comparable without normalization. The specific issues:

Price format. Some sources post weekly prices, some monthly. Some include bills, some don't. Agency sites sometimes post "pcm" (per calendar month) in some listings and "pw" (per week) in others. All prices get converted to monthly EUR at ingestion. This sounds simple and has about thirty edge cases.

Bedroom count. "2 bedrooms + box room" is listed as 2 bedrooms on some sites and 3 on others. A box room is not a standard bedroom but some landlords list it as one to justify a higher price. I normalize to functional bedrooms, which means the box room goes to a separate field. This affects price comparisons: if you're comparing a 2-bed to other 2-beds and one of them has a box room that another landlord would call a third bedroom, your benchmark is off.

Location. This is the hardest one. "Rathmines," "Dublin 6," "D6," and "Rathmines Village" are all the same thing depending on which source you're reading. Address normalization across sources requires fuzzy matching, postal district parsing, and a lookup table that maps informal neighborhood names to canonical areas.

I use a two-stage location resolution: first parse to lat/lng from the address string (using a geocoding API for addresses that parse cleanly, fuzzy matching for ones that don't), then map lat/lng to neighborhood polygon using a Dublin neighborhood boundary dataset. The neighborhood polygon layer is what's used for price comparisons.

BER rating. Every listing in Ireland is legally required to have a BER (Building Energy Rating). In practice, some listings omit it. For listings with a rating, it's a structured value (A1 through G) and reliable. For listings without, the field is null and treated as unknown.

The comparison model

With normalized data, the price fairness question becomes: given this property's bedroom count, neighborhood, and BER rating, what's the price distribution for comparable properties right now?

The comparison model groups listings by:

Bedroom count (exact match)
Neighborhood (same neighborhood polygon, or adjacent neighborhoods if the sample in the exact neighborhood is too small)
BER band (A/B, C, D, E/F/G) to avoid comparing an A-rated property to a G-rated one

Within that group, I calculate the median, 25th percentile, and 75th percentile of monthly price. The output for any given listing is where that listing's price falls in the distribution.

The framing in the product is deliberate: "this property is in the top quartile for 2-beds in Rathmines at BER C" rather than "this property is overpriced." Whether the premium is justified is a user judgment, not a data judgment. Maybe it has better light. Maybe the landlord is unusually responsive. The data removes the information gap; it doesn't replace the decision.

The sample size problem

Dublin's neighborhoods vary significantly in listing volume. D2 (city centre) and D6 (Rathmines/Ranelagh) have enough active listings at any time that a 2-bed price distribution is meaningful. Smaller neighborhoods might have 3-4 active 2-beds at any point.

When the sample in the target neighborhood is too small (below a threshold I set at around 8-10 listings), I expand to adjacent neighborhoods. This introduces some noise because adjacent neighborhoods aren't identical markets. I flag to the user when the comparison is using an expanded geographic area so they understand the basis.

The alternative was to only show price context when the sample is large enough to be reliable. That turns out to leave a lot of users without any context because most Dublin suburbs have thin listing volume at any given bedroom count. The expanded geography approach, with transparency about what's being expanded, is more useful in practice.

Historical trends

With enough historical data, you can calculate trends: how has the median price for a 2-bed in Rathmines changed over the past 12 months? This is useful for users trying to time their search or understand whether the market is moving.

The limitation: we've been running long enough to have meaningful trend data, but not long enough to have multi-year trends. The next 12-18 months of data will make this feature significantly more useful.

The thing I can't build from listing data: agreed prices. If a landlord lists at EUR 2,200 and lets at EUR 2,050, the agreed price data never reaches me. The gap between asking and agree in Dublin is probably 2-5% in most areas (the market doesn't negotiate much), but "probably" isn't good enough if you want precise fairness benchmarks. The RTB data eventually captures this but with a lag that makes it useless for current decisions.

Implementation notes

The pipeline: Python for crawling and parsing, Postgres for storage, dbt for the normalization and comparison model transformations. The price distribution calculations are pre-computed on a daily schedule rather than run at query time, which keeps the user-facing API fast.

The geocoding step is the slowest part of ingestion, roughly 200-400ms per address with a good geocoding provider. Cached aggressively; addresses that have already been geocoded don't hit the API again.

The user-facing version — how to actually use this data to check your own situation — is at https://homescout.io/guide/am-i-overpaying-rent-dublin-2026. Less pipeline detail, more "here's the number and what it means."

Caspar Bannink. Founder of HomeScout.io. Building AI-powered rental search for Dublin.