<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Caspar Bannink</title>
    <description>The latest articles on DEV Community by Caspar Bannink (@caspar_bannink_3728f095d1).</description>
    <link>https://dev.to/caspar_bannink_3728f095d1</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3908117%2F6c89943c-9c97-4b2e-beff-8330fe8c2390.png</url>
      <title>DEV Community: Caspar Bannink</title>
      <link>https://dev.to/caspar_bannink_3728f095d1</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/caspar_bannink_3728f095d1"/>
    <language>en</language>
    <item>
      <title>What I Learned Normalizing Dublin Rental Listings from Messy Public Sources</title>
      <dc:creator>Caspar Bannink</dc:creator>
      <pubDate>Tue, 02 Jun 2026 01:55:59 +0000</pubDate>
      <link>https://dev.to/caspar_bannink_3728f095d1/what-i-learned-normalizing-dublin-rental-listings-from-messy-public-sources-48de</link>
      <guid>https://dev.to/caspar_bannink_3728f095d1/what-i-learned-normalizing-dublin-rental-listings-from-messy-public-sources-48de</guid>
      <description>&lt;p&gt;I started HomeScout because Dublin renting is painful in a very specific way: the market is fragmented, fast-moving, and full of near-duplicates. At first I thought the hard part would be the AI layer. It was not.&lt;/p&gt;

&lt;p&gt;The hard part was turning messy rental listings into a dataset that was stable enough for software to reason over.&lt;/p&gt;

&lt;p&gt;An "AI rental search" product sounds like a natural-language interface problem. In practice, the useful version is mostly data engineering: normalize listing fields, detect duplicates, infer locations, spot stale records, preserve provenance, and only then let an AI system rank or explain anything.&lt;/p&gt;

&lt;p&gt;This is what I learned building the listing pipeline behind HomeScout.&lt;/p&gt;

&lt;h2&gt;
  
  
  A rental listing is not a clean object
&lt;/h2&gt;

&lt;p&gt;The naive schema is simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;title
address
price
beds
baths
description
source_url
photos
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That works for a demo. It breaks as soon as you combine sources.&lt;/p&gt;

&lt;p&gt;One source may expose the postal district as a clean field. Another may bury it in the title. One listing says "Dublin 8"; another says "Kilmainham"; another says "near Heuston"; all three may refer to the same practical search area. Some listings include BER ratings, some do not. Some include available dates. Some have "contact agent" instead of a direct email. Some silently change price without changing URL.&lt;/p&gt;

&lt;p&gt;The first lesson: a listing is not one object. It is a current observation of a property-like thing from a source.&lt;/p&gt;

&lt;p&gt;That distinction matters. I ended up treating each source listing as an observation, then linking observations to a normalized listing candidate.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;source_observation
  source
  source_listing_id
  source_url
  raw_title
  raw_address
  raw_price
  raw_description
  first_seen_at
  last_seen_at
  raw_payload_hash

normalized_listing
  canonical_title
  normalized_price_eur
  beds
  baths
  inferred_area
  inferred_postal_district
  geo_confidence
  dedupe_group_id
  availability_state
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That extra layer sounds boring, but it prevents a lot of downstream mistakes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deduplication is fuzzy, not exact
&lt;/h2&gt;

&lt;p&gt;Duplicate rental listings rarely match perfectly.&lt;/p&gt;

&lt;p&gt;The same apartment can appear with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;slightly different titles&lt;/li&gt;
&lt;li&gt;reordered address fragments&lt;/li&gt;
&lt;li&gt;different photo counts&lt;/li&gt;
&lt;li&gt;one source saying "2 bedroom apartment" and another saying "2 bed flat"&lt;/li&gt;
&lt;li&gt;a price changed by 50 euro&lt;/li&gt;
&lt;li&gt;a letting agent reposting after a stale listing expires&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Exact URL matching is not enough. Exact address matching is not enough either, because addresses are often incomplete or phrased inconsistently.&lt;/p&gt;

&lt;p&gt;The dedupe approach that worked best was a scored match across multiple weak signals:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;same_or_similar_address       +35
same_postal_district          +10
same_bed_count                +15
price_within_small_delta      +15
title_similarity_high         +10
shared_photo_fingerprint      +30
same_agent_or_agency          +10
seen_within_recent_window     +10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important part is that no single signal is trusted absolutely. Address is strong, but not always present. Photos are strong, but not always stable. Price is useful, but listings change price. Agent identity helps, but large agencies list many similar apartments.&lt;/p&gt;

&lt;p&gt;I also keep the dedupe decision explainable. If two observations are grouped, the system stores the reason and score. That makes it possible to undo bad merges later.&lt;/p&gt;

&lt;p&gt;Bad dedupe is worse than no dedupe. If you merge two different apartments, every ranking, alert, and user note attached to that listing becomes suspect.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stale listings need a state machine
&lt;/h2&gt;

&lt;p&gt;Rental listings disappear quickly. Some are removed. Some are reposted. Some become stale without ever being explicitly marked unavailable.&lt;/p&gt;

&lt;p&gt;The first version of my pipeline treated "not found in latest scrape" as unavailable. That was too aggressive. Sources can fail, pages can be incomplete, and rate limits can produce partial results.&lt;/p&gt;

&lt;p&gt;The better model is a small state machine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;active -&amp;gt; missing_once -&amp;gt; missing_repeatedly -&amp;gt; stale -&amp;gt; archived
active -&amp;gt; price_changed
active -&amp;gt; content_changed
stale -&amp;gt; active_reposted
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That gives the system tolerance for noisy crawls. A listing does not vanish because one run missed it. It only becomes stale after repeated evidence.&lt;/p&gt;

&lt;p&gt;This also matters for alerts. A reposted stale listing is not the same as a new listing, but for a renter it might still be relevant. A price drop is not a new listing either, but it can be more important than a new listing.&lt;/p&gt;

&lt;p&gt;So the event stream needs more nuance than "new property found."&lt;/p&gt;

&lt;h2&gt;
  
  
  Address ambiguity is the hardest product problem
&lt;/h2&gt;

&lt;p&gt;Dublin addresses are messy from a search perspective.&lt;/p&gt;

&lt;p&gt;Users think in areas, commutes, postal districts, landmarks, and transport lines. Listings use a mix of all of those.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Dublin 6"&lt;/li&gt;
&lt;li&gt;"Rathmines"&lt;/li&gt;
&lt;li&gt;"near Ranelagh Luas"&lt;/li&gt;
&lt;li&gt;"city centre"&lt;/li&gt;
&lt;li&gt;"Docklands"&lt;/li&gt;
&lt;li&gt;"Grand Canal"&lt;/li&gt;
&lt;li&gt;"Dublin 2"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These overlap but are not interchangeable.&lt;/p&gt;

&lt;p&gt;I split location handling into three layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Raw location text from the source.&lt;/li&gt;
&lt;li&gt;Inferred structured labels: area, postal district, locality.&lt;/li&gt;
&lt;li&gt;Geographic confidence: exact, approximate, area-level, unknown.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The confidence field is important. If a listing only says "Dublin city centre", the system should not pretend it has precise coordinates. It can still be useful, but the UI and ranking need to know the location is approximate.&lt;/p&gt;

&lt;p&gt;This also affects natural-language search. If a user says "near the DART", that should not be solved by an LLM inventing areas. It should resolve through a deterministic lookup table of stations, corridors, and distance bands.&lt;/p&gt;

&lt;p&gt;LLMs are useful for translating messy user intent into structured constraints. They are not a good source of geographic truth.&lt;/p&gt;

&lt;h2&gt;
  
  
  Price normalization is not just parsing euros
&lt;/h2&gt;

&lt;p&gt;Most listings are monthly rent, but the raw text still needs care.&lt;/p&gt;

&lt;p&gt;Common problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;commas and periods in different places&lt;/li&gt;
&lt;li&gt;"per month" omitted&lt;/li&gt;
&lt;li&gt;bills included or excluded&lt;/li&gt;
&lt;li&gt;sharing listings mixed with whole-property listings&lt;/li&gt;
&lt;li&gt;parking or utility fees mentioned in description&lt;/li&gt;
&lt;li&gt;price changes on the same URL&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For HomeScout, I normalize rent to monthly EUR and store the raw price string separately. If a price changes, that is an event, not just a field update.&lt;/p&gt;

&lt;p&gt;I also avoid over-normalizing things I cannot prove. If a description says "bills included", that becomes a flag with source evidence. If it only implies bills might be included, it stays unknown.&lt;/p&gt;

&lt;p&gt;This is where a lot of AI products quietly go wrong: they convert uncertainty into false certainty because clean fields are easier to rank.&lt;/p&gt;

&lt;h2&gt;
  
  
  Provenance matters more than it sounds
&lt;/h2&gt;

&lt;p&gt;Every normalized field should know where it came from.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;beds = 2
source = parsed_title
confidence = high

area = Rathmines
source = address_text
confidence = medium

pet_friendly = unknown
source = no_positive_evidence
confidence = low
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That makes the AI explanation layer much safer.&lt;/p&gt;

&lt;p&gt;Instead of saying:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This listing is pet friendly.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;it can say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I did not find a pet policy in the listing text.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That difference matters because users act on these explanations. If the system cannot distinguish "false" from "unknown", it will mislead people.&lt;/p&gt;

&lt;h2&gt;
  
  
  The AI layer should be downstream
&lt;/h2&gt;

&lt;p&gt;The architecture that has worked best is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;collect observations
normalize fields
dedupe candidates
infer location with confidence
track listing state
build user-specific hard filters
rank candidates
use AI to explain, draft, and summarize
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The LLM is not the database. It is not the source of truth. It sits after the deterministic pipeline.&lt;/p&gt;

&lt;p&gt;For example, if a renter says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"2 bed near the Luas under 2200, ideally not too far from Grand Canal"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;the system should parse that into:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"beds_min"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"max_price_eur"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"transport"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"luas"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"soft_area_preference"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Grand Canal"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then normal database queries and geographic lookups do the heavy lifting. The AI can help explain why a listing matched, draft an inquiry email, or summarize tradeoffs, but it should not hallucinate inventory.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I would do earlier next time
&lt;/h2&gt;

&lt;p&gt;If I were starting again, I would invest earlier in three things.&lt;/p&gt;

&lt;p&gt;First, raw observation storage. Keep the raw payloads or at least stable hashes and extracted raw fields. You will need them when a normalized decision looks wrong.&lt;/p&gt;

&lt;p&gt;Second, confidence scores. Not ML confidence in the fancy sense, just explicit quality labels for inferred fields. Exact address is not the same as inferred area. Unknown is not false.&lt;/p&gt;

&lt;p&gt;Third, event history. Renters care about changes: new listing, price drop, repost, stale, reactivated. A snapshot table alone loses that.&lt;/p&gt;

&lt;p&gt;The main lesson is that AI is only useful if the underlying data model is honest about uncertainty.&lt;/p&gt;

&lt;p&gt;For rental search, the hard technical problem is not making a chatbot that talks about apartments. It is building a data pipeline that knows what it knows, knows what it guessed, and does not blur the two.&lt;/p&gt;

&lt;p&gt;That is the part I underestimated.&lt;/p&gt;




&lt;p&gt;I am building HomeScout for Dublin renters: &lt;a href="https://homescout.io" rel="noopener noreferrer"&gt;https://homescout.io&lt;/a&gt;&lt;/p&gt;

</description>
      <category>dublin</category>
      <category>renting</category>
      <category>ireland</category>
      <category>proptech</category>
    </item>
    <item>
      <title>Claude Opus 4.8 Is Not Just a Benchmark Bump</title>
      <dc:creator>Caspar Bannink</dc:creator>
      <pubDate>Tue, 02 Jun 2026 00:59:13 +0000</pubDate>
      <link>https://dev.to/caspar_bannink_3728f095d1/claude-opus-48-is-not-just-a-benchmark-bump-2lgc</link>
      <guid>https://dev.to/caspar_bannink_3728f095d1/claude-opus-48-is-not-just-a-benchmark-bump-2lgc</guid>
      <description>&lt;p&gt;Claude Opus 4.8 matters for a more practical reason than "new flagship model shipped."&lt;/p&gt;

&lt;p&gt;Anthropic launched it on May 28, 2026 as an Opus upgrade aimed directly at coding, AI agents, and long-running professional work. On the official product page, Anthropic describes it as a hybrid reasoning model for coding and AI agents with a 1M context window, and says it has the consistency and autonomy to keep working on long-running tasks.&lt;/p&gt;

&lt;p&gt;That framing is more important than the usual leaderboard chatter, because coding agents rarely fail in dramatic ways. They fail by losing the thread, using tools badly, stopping early, or quietly making the wrong edit and moving on.&lt;/p&gt;

&lt;p&gt;If Anthropic's positioning is right, Opus 4.8 is not mainly a smarter chatbot. It is an attempt to improve the part that matters when the work gets long, messy, and expensive.&lt;/p&gt;

&lt;h2&gt;
  
  
  The benchmark story is real, but incomplete
&lt;/h2&gt;

&lt;p&gt;There is a benchmark angle here, and it is worth taking seriously. Artificial Analysis currently places Claude Opus 4.8 at the top of its Intelligence Index frontier cluster, ahead of GPT-5.5 xhigh and GPT-5.5 high in the comparison snapshot linked below.&lt;/p&gt;

&lt;p&gt;That is enough to say Opus 4.8 belongs in the top frontier tier. It is not enough to say it is universally the best model for every workflow.&lt;/p&gt;

&lt;p&gt;That distinction matters because nobody actually buys "a benchmark." You buy a working system: model, provider, latency profile, tool behavior, cost envelope, and how often the thing finishes a real task without supervision.&lt;/p&gt;

&lt;h2&gt;
  
  
  The coding table matters more than the general leaderboard
&lt;/h2&gt;

&lt;p&gt;For a release like this, the first table I want is not a general "who is smartest?" table. It is the coding-agent table.&lt;/p&gt;

&lt;p&gt;The reported SWE-Bench Pro numbers are the clearest version of that right now:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Coding benchmark&lt;/th&gt;
&lt;th&gt;Reported score&lt;/th&gt;
&lt;th&gt;Why it matters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.8&lt;/td&gt;
&lt;td&gt;SWE-Bench Pro&lt;/td&gt;
&lt;td&gt;69.2%&lt;/td&gt;
&lt;td&gt;Strongest reported coding-agent score in this comparison&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.7&lt;/td&gt;
&lt;td&gt;SWE-Bench Pro&lt;/td&gt;
&lt;td&gt;64.3%&lt;/td&gt;
&lt;td&gt;Shows the direct Opus-to-Opus improvement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5&lt;/td&gt;
&lt;td&gt;SWE-Bench Pro&lt;/td&gt;
&lt;td&gt;58.6%&lt;/td&gt;
&lt;td&gt;Still frontier-class, but behind this reported Opus 4.8 result&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 3.1 Pro&lt;/td&gt;
&lt;td&gt;SWE-Bench Pro&lt;/td&gt;
&lt;td&gt;54.2%&lt;/td&gt;
&lt;td&gt;Useful broad frontier comparison point&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I would not overread this table. SWE-Bench is not your repo, your test suite, your code review standard, or your deployment budget.&lt;/p&gt;

&lt;p&gt;But I would not ignore it either. A 4.9 point jump from Opus 4.7 to 4.8 on a coding benchmark is exactly the kind of thing that can matter in agent loops, especially if the model also gets better at flagging weak code instead of confidently moving on.&lt;/p&gt;

&lt;h2&gt;
  
  
  The price table is the other half
&lt;/h2&gt;

&lt;p&gt;The second table is cost. A better model is not automatically a better default model.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model or mode&lt;/th&gt;
&lt;th&gt;Input price&lt;/th&gt;
&lt;th&gt;Output price&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.8 standard&lt;/td&gt;
&lt;td&gt;$5 / 1M tokens&lt;/td&gt;
&lt;td&gt;$25 / 1M tokens&lt;/td&gt;
&lt;td&gt;Same standard list price as Opus 4.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.8 fast mode&lt;/td&gt;
&lt;td&gt;$10 / 1M tokens&lt;/td&gt;
&lt;td&gt;$50 / 1M tokens&lt;/td&gt;
&lt;td&gt;Research preview, roughly 2.5x faster output according to Anthropic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.7 standard&lt;/td&gt;
&lt;td&gt;$5 / 1M tokens&lt;/td&gt;
&lt;td&gt;$25 / 1M tokens&lt;/td&gt;
&lt;td&gt;Useful baseline because 4.8 replaces this in the same price band&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.7 fast mode&lt;/td&gt;
&lt;td&gt;$30 / 1M tokens&lt;/td&gt;
&lt;td&gt;$150 / 1M tokens&lt;/td&gt;
&lt;td&gt;Reported previous fast-mode price point&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That makes the release more interesting than a normal model bump. The standard price stays flat, but the fast path becomes much less painful.&lt;/p&gt;

&lt;p&gt;For coding agents, speed is not cosmetic. If an agent is reading files, editing code, running tests, reviewing output, and doing another pass, every turn has a latency tax. Fast mode can change whether a workflow feels usable, even if it is not the cheapest way to burn tokens.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Anthropic is actually claiming
&lt;/h2&gt;

&lt;p&gt;Anthropic's own language is unusually specific. The official Opus page calls it a model that pushes the frontier for coding and AI agents, and the launch materials say it is stronger across coding, agentic tasks, and professional work.&lt;/p&gt;

&lt;p&gt;That is a stronger claim than generic intelligence. It is a claim about operational behavior:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;coding performance&lt;/li&gt;
&lt;li&gt;agentic execution&lt;/li&gt;
&lt;li&gt;tool use&lt;/li&gt;
&lt;li&gt;consistency on long tasks&lt;/li&gt;
&lt;li&gt;autonomy over multi-step work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the right lens for evaluating it. The useful question is not just whether a score moved up. The useful question is whether the model reduces failure inside a real coding loop.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Opus 4.7 to 4.8 jump looks incremental, but that can still matter
&lt;/h2&gt;

&lt;p&gt;Anthropic itself describes Opus 4.8 as a modest but tangible improvement on its predecessor. That reads as credible.&lt;/p&gt;

&lt;p&gt;This does not look like a category reset. It looks like an iterative frontier upgrade with sharper emphasis on coding, better judgment during agentic work, and better behavior over long tasks. Anthropic also says early testers found it more reliable and more likely to flag uncertainty instead of overstating progress, which is exactly the kind of improvement that matters in autonomous workflows.&lt;/p&gt;

&lt;p&gt;That kind of delta can be commercially meaningful even if it does not look dramatic in a launch graphic.&lt;/p&gt;

&lt;p&gt;A coding agent does not need to be universally better at everything to be more useful. It needs to hold context longer, recover more cleanly, and make fewer silent bad decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude Code is part of the release, not a footnote
&lt;/h2&gt;

&lt;p&gt;The most relevant tooling angle is Claude Code.&lt;/p&gt;

&lt;p&gt;Opus 4.8 did not arrive alone. Anthropic also pushed Dynamic Workflows in Claude Code as a research preview. The idea is simple: for a large coding task, Claude can split the job across many parallel subagents, verify work, and combine the result before reporting back.&lt;/p&gt;

&lt;p&gt;That matters because model quality and coding-tool design are starting to blur together. A stronger coding model is useful. A stronger coding model inside a tool that can plan, fan out, check work, and recover from bad branches is more interesting.&lt;/p&gt;

&lt;p&gt;This is also where the Codex comparison becomes relevant.&lt;/p&gt;

&lt;p&gt;My Codex setup is already built around model routing and subagent orchestration. That means the practical comparison is not only "Opus 4.8 versus GPT-5.5." It is:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workflow&lt;/th&gt;
&lt;th&gt;Model angle&lt;/th&gt;
&lt;th&gt;Tooling angle&lt;/th&gt;
&lt;th&gt;What I would test&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code + Opus 4.8&lt;/td&gt;
&lt;td&gt;Strong coding-agent benchmark position&lt;/td&gt;
&lt;td&gt;Dynamic Workflows, fast mode, Claude-native agent loops&lt;/td&gt;
&lt;td&gt;Large repo migration, failing test repair, multi-file refactor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex + OpenAI models&lt;/td&gt;
&lt;td&gt;Strong OpenAI coding stack and local routing&lt;/td&gt;
&lt;td&gt;Explicit orchestrator plus subagents, review, verification loops&lt;/td&gt;
&lt;td&gt;Same repo task with identical success criteria&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cursor or editor agents&lt;/td&gt;
&lt;td&gt;Fast interactive coding loop&lt;/td&gt;
&lt;td&gt;IDE-native context and diffs&lt;/td&gt;
&lt;td&gt;Smaller edits, review latency, developer control&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I have not run a controlled Claude Code versus Codex test on Opus 4.8 yet, so I would not claim a winner.&lt;/p&gt;

&lt;p&gt;But this is the test that matters. Not which model writes the best launch demo. Which stack gets a messy change through implementation, tests, review, and cleanup with the least babysitting.&lt;/p&gt;

&lt;h2&gt;
  
  
  The workflow test matters more than the launch claim
&lt;/h2&gt;

&lt;p&gt;Anthropic's release page says Opus 4.8 improves on benchmarks across coding, agentic skills, reasoning, and practical knowledge work tasks. That is a useful signal, and it is stronger than vague marketing language because Anthropic at least anchors the claim in a system card and named evaluations.&lt;/p&gt;

&lt;p&gt;Still, a launch page is not the same thing as production proof.&lt;/p&gt;

&lt;p&gt;For anyone building or buying coding agents, the real evaluation stack is broader:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;benchmark position&lt;/li&gt;
&lt;li&gt;coding-task reliability&lt;/li&gt;
&lt;li&gt;context window&lt;/li&gt;
&lt;li&gt;tool-use behavior&lt;/li&gt;
&lt;li&gt;long-horizon autonomy&lt;/li&gt;
&lt;li&gt;latency&lt;/li&gt;
&lt;li&gt;throughput&lt;/li&gt;
&lt;li&gt;token economics&lt;/li&gt;
&lt;li&gt;provider quality&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last point gets underrated. The same model name can feel very different depending on where you run it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Provider choice changes the economics
&lt;/h2&gt;

&lt;p&gt;This is one of the most practical parts of the story.&lt;/p&gt;

&lt;p&gt;On Artificial Analysis's provider benchmarking page for Claude Opus 4.8, Amazon is the fastest by output speed at 64.4 tokens per second, Anthropic follows at 62.1, and Google is close behind at 60.1. For latency, Google leads at 7.36 seconds to first token, Amazon is at 10.31 seconds, and Anthropic is at 20.02 seconds. Artificial Analysis also shows all three at the same blended benchmark price of $4.10 per 1M tokens in that comparison.&lt;/p&gt;

&lt;p&gt;That is a useful reminder that "which model?" is only half the routing decision. "Which provider?" can materially change the experience even when the underlying model is identical.&lt;/p&gt;

&lt;p&gt;For coding agents, that matters a lot. A sluggish provider can make a disciplined workflow feel mushy. A faster path can make repeated tool calls, verification passes, and long-context work feel much more usable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost is more interesting than the headline suggests
&lt;/h2&gt;

&lt;p&gt;Anthropic's official list pricing is unchanged from Opus 4.7: $5 per million input tokens and $25 per million output tokens for regular usage. Fast mode is listed at $10 per million input tokens and $50 per million output tokens.&lt;/p&gt;

&lt;p&gt;Anthropic also says fast mode can run at 2.5x the speed and is now three times cheaper than it was for previous models. That makes the pricing story more interesting than "same price as before." The company is not just holding the base line steady. It is also trying to improve the speed-cost tradeoff for teams that care about turnaround time.&lt;/p&gt;

&lt;p&gt;That official list pricing is separate from the Artificial Analysis provider screenshot. Anthropic's numbers are list price from the release page. Artificial Analysis's $4.10 figure is a blended benchmarking view across providers, not the official posted token rate.&lt;/p&gt;

&lt;p&gt;In practice, the more important number is still cost per useful completed task.&lt;/p&gt;

&lt;p&gt;A model that looks slightly expensive on paper can be cheaper in the real world if it finishes more runs cleanly, needs fewer retries, and wastes less review time. A model that looks cheap by token can become expensive if it stalls, drifts, or burns time with poor tool behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means for GPT-5.5 comparisons
&lt;/h2&gt;

&lt;p&gt;The cleanest supported comparison is the narrow one.&lt;/p&gt;

&lt;p&gt;Artificial Analysis places Opus 4.8 slightly ahead of GPT-5.5 xhigh and GPT-5.5 high in the current frontier ranking snapshot. That supports the claim that Opus 4.8 is in the very top cluster and currently has a slight edge in that benchmark view.&lt;/p&gt;

&lt;p&gt;It does not support a sweeping claim that Opus 4.8 beats GPT-5.5 everywhere.&lt;/p&gt;

&lt;p&gt;That is fine, because "best model" is usually the wrong question anyway. The useful questions are narrower:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;best for long coding-agent runs&lt;/li&gt;
&lt;li&gt;best for low-latency interaction&lt;/li&gt;
&lt;li&gt;best for strict budget pressure&lt;/li&gt;
&lt;li&gt;best for giant context reads&lt;/li&gt;
&lt;li&gt;best for tool reliability&lt;/li&gt;
&lt;li&gt;best for unattended execution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are different buying decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I think this release is worth watching
&lt;/h2&gt;

&lt;p&gt;I spend most of my day building agentic software for HomeScout, my rental search product, so I care less about launch-day screenshots than about whether a model keeps working when a task gets long and annoying.&lt;/p&gt;

&lt;p&gt;That is why Opus 4.8 stands out.&lt;/p&gt;

&lt;p&gt;Not because one leaderboard moved. Not because every frontier lab says the new one is better. But because the release is explicitly aimed at coding and agentic behavior, the benchmark position is strong, the official pricing is clear, and the provider-level differences are large enough to affect real deployments.&lt;/p&gt;

&lt;p&gt;The next useful evidence will not be another announcement thread. It will be whether agent teams start reporting fewer failed runs, cleaner tool use, and better long-horizon task completion with Opus 4.8 in production.&lt;/p&gt;

&lt;p&gt;Until then, the practical takeaway is simple: Claude Opus 4.8 looks like a serious top-tier option for coding agents, but the real decision is still workflow fit, provider routing, and cost per completed task.&lt;/p&gt;

&lt;h2&gt;
  
  
  Source links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/claude/opus" rel="noopener noreferrer"&gt;Anthropic product page&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/news/claude-opus-4-8" rel="noopener noreferrer"&gt;Anthropic release page&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://artificialanalysis.ai/models/claude-opus-4-8" rel="noopener noreferrer"&gt;Artificial Analysis model page&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://artificialanalysis.ai/models/claude-opus-4-8/providers" rel="noopener noreferrer"&gt;Artificial Analysis provider page&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://platform.claude.com/docs/en/docs/about-claude/pricing" rel="noopener noreferrer"&gt;Anthropic pricing docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.llmreference.com/compare/claude-opus-4-8/gpt-5.5" rel="noopener noreferrer"&gt;Claude Opus 4.8 launch coverage with SWE-Bench Pro comparison&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://venturebeat.com/technology/anthropics-claude-opus-4-8-is-here-with-3x-cheaper-fast-mode-and-near-mythos-level-alignment" rel="noopener noreferrer"&gt;VentureBeat coverage of fast mode and Dynamic Workflows&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;I am Caspar Bannink, founder of HomeScout (AI rental search for Dublin) and Bannink Software Development.&lt;/p&gt;

&lt;p&gt;Check out my side project: &lt;a href="https://homescout.io" rel="noopener noreferrer"&gt;homescout.io&lt;/a&gt;&lt;br&gt;&lt;br&gt;
Personal LinkedIn: &lt;a href="https://www.linkedin.com/in/caspar-bannink-719440217/" rel="noopener noreferrer"&gt;linkedin.com/in/caspar-bannink-719440217&lt;/a&gt;&lt;br&gt;&lt;br&gt;
HomeScout LinkedIn: &lt;a href="https://www.linkedin.com/company/homescout-io" rel="noopener noreferrer"&gt;linkedin.com/company/homescout-io&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>coding</category>
      <category>benchmarks</category>
    </item>
    <item>
      <title>Data Normalization Across Dublin Rental Portals: How to Make Listings Comparable</title>
      <dc:creator>Caspar Bannink</dc:creator>
      <pubDate>Wed, 27 May 2026 12:26:54 +0000</pubDate>
      <link>https://dev.to/caspar_bannink_3728f095d1/data-normalization-across-dublin-rental-portals-how-to-make-listings-comparable-6jm</link>
      <guid>https://dev.to/caspar_bannink_3728f095d1/data-normalization-across-dublin-rental-portals-how-to-make-listings-comparable-6jm</guid>
      <description>&lt;h1&gt;
  
  
  Data Normalization Across Dublin Rental Portals: How to Make Listings Comparable
&lt;/h1&gt;

&lt;p&gt;Dublin rental listings are fragmented even across the main portals. Daft.ie and Rent.ie use different structures, labels, price conventions, and quirks, which makes direct comparison harder than it should be.&lt;/p&gt;

&lt;p&gt;When I built the comparison layer for HomeScout, the aggregation part turned out to be straightforward compared to normalization. Here's what the normalization problem actually looks like, and how I approached it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The raw data problem
&lt;/h2&gt;

&lt;p&gt;Consider something as simple as price. Across sources you'll see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;"€1,750 per month"&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;"1750 pcm"&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;"£1750/month"&lt;/code&gt; (some UK-registered portals still do this for Dublin listings)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;"From €1,700"&lt;/code&gt; (minimum of a range)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;"Price on application"&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;"1,750"&lt;/code&gt; with currency implied by context&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;"€1,750 per month + utilities"&lt;/code&gt; (you have to decide whether to strip the utility note or flag it)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's one field. Beds has similar variation: "2 bed", "2 bedroom", "Two bedrooms", "2BR", "2+1" (2 beds plus a box room). Some sources omit beds entirely and describe the property type instead.&lt;/p&gt;

&lt;p&gt;Area/location is the worst. Sources use different geographic taxonomies. One uses Dublin postal codes (D4, D6, D6W). Another uses neighborhood names (Rathmines, Ranelagh, Rathgar). Another uses the street address and nothing else. Some use both but inconsistently. The same property can appear as "Ranelagh, Dublin 6" on one source and "Dublin 6" on another, and you have to know those are the same area.&lt;/p&gt;

&lt;h2&gt;
  
  
  The normalization pipeline
&lt;/h2&gt;

&lt;p&gt;Each source gets a custom extractor that produces a raw record. The raw record has whatever fields the source provides, with light cleaning (strip HTML, trim whitespace, decode entities). No interpretation yet.&lt;/p&gt;

&lt;p&gt;The normalization step runs after extraction. It takes the raw record and produces a canonical record with typed, standardized fields.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Price normalization:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;normalize_price&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Returns (monthly_eur, price_qualifier)
    qualifier: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;exact&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; | &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;from&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; | &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;on_application&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; | &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;raw_price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

    &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw_price&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;application&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;poa&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;on_application&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

    &lt;span class="c1"&gt;# Extract numeric value
&lt;/span&gt;    &lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;[^\d,.]&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;amount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

    &lt;span class="c1"&gt;# Weekly to monthly conversion
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;per week&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/week&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;pw&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;52&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;qualifier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;from&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;from&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;exact&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;qualifier&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The qualifier field matters for display. A "from" price should be labeled differently than an exact price in comparison views.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bedroom normalization:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Word-to-number mapping handles written numbers. The "+1" box room convention gets flagged separately so you can filter on actual bedrooms vs. "bedrooms including box room."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Geographic normalization:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the hard part. My approach:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Extract any Dublin postcode from the raw location string (regex for D1-D24, D6W)&lt;/li&gt;
&lt;li&gt;If no postcode, attempt a fuzzy match against a neighborhood lookup table&lt;/li&gt;
&lt;li&gt;If that fails, geocode the street address and assign to the containing postal district&lt;/li&gt;
&lt;li&gt;If all else fails, store the raw string and flag for manual review&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The neighborhood lookup table is a maintained JSON file with aliases. "Ranelagh" maps to D6. "Rathmines" maps to D6. "Rathgar" maps to D6. "Harold's Cross" maps to D6W. And so on. It's not glamorous but it works.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ranelagh"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"district"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"D6"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"canonical_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Ranelagh"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"rathmines"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"district"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"D6"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"canonical_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Rathmines"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"rathgar"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"district"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"D6"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"canonical_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Rathgar"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"harolds cross"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"district"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"D6W"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"canonical_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Harold's Cross"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Deduplication
&lt;/h2&gt;

&lt;p&gt;The same property often appears on multiple sources. Deduplication is a separate pass after normalization.&lt;/p&gt;

&lt;p&gt;I use a blocking strategy: only compare listings within the same price band (+/- 10%) and same area (same district or neighboring districts). Within a block, I compute a similarity score based on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Address string similarity (Levenshtein on the normalized address)&lt;/li&gt;
&lt;li&gt;Price match (exact or within 5%)&lt;/li&gt;
&lt;li&gt;Bed/bath match&lt;/li&gt;
&lt;li&gt;Available date proximity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Score above threshold: mark as duplicate, keep the richest record (the one with more fields populated), store source provenance for both.&lt;/p&gt;

&lt;p&gt;The threshold needs tuning. Too low and you collapse distinct listings. Too high and you miss obvious duplicates. I landed on a score that errs toward keeping separate records when uncertain, because a false merge (showing one listing when there are two) is worse than a false non-merge (showing the same property twice).&lt;/p&gt;

&lt;h2&gt;
  
  
  Making listings comparable in the UI
&lt;/h2&gt;

&lt;p&gt;After normalization, every listing has the same fields in the same format. The comparison view is then straightforward: pick listings to compare, render their canonical fields side by side.&lt;/p&gt;

&lt;p&gt;The useful columns for Dublin rentals turned out to be: price, beds, baths, area (with DART/Luas proximity calculated from lat/lng), included utilities, pet policy, available date, and lease term. I surface which fields came from structured source data vs. which were inferred from description text, because inferred data has lower reliability.&lt;/p&gt;

&lt;p&gt;I wrote a more user-facing version of this at &lt;a href="https://homescout.io/guide/how-to-compare-dublin-apartments-without-spreadsheet" rel="noopener noreferrer"&gt;https://homescout.io/guide/how-to-compare-dublin-apartments-without-spreadsheet&lt;/a&gt; if you want to see what the normalized comparison looks like in practice.&lt;/p&gt;

&lt;p&gt;The normalization work is not exciting. It's the kind of thing that takes three times longer than you expect and surfaces a new edge case every week. But it's the foundation everything else sits on.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Caspar Bannink. Founder of HomeScout.io. Building AI-powered rental search for Dublin.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>data</category>
      <category>dataengineering</category>
      <category>sideprojects</category>
      <category>webscraping</category>
    </item>
    <item>
      <title>Building a Rental Aggregator When Daft.ie Already Exists</title>
      <dc:creator>Caspar Bannink</dc:creator>
      <pubDate>Wed, 27 May 2026 12:26:43 +0000</pubDate>
      <link>https://dev.to/caspar_bannink_3728f095d1/building-a-rental-aggregator-when-daftie-already-exists-45j0</link>
      <guid>https://dev.to/caspar_bannink_3728f095d1/building-a-rental-aggregator-when-daftie-already-exists-45j0</guid>
      <description>&lt;h1&gt;
  
  
  Building a Rental Aggregator When Daft.ie Already Exists
&lt;/h1&gt;

&lt;p&gt;When people find out I'm building a rental search platform for Dublin, the first question is usually some version of: "But why? Daft is already there."&lt;/p&gt;

&lt;p&gt;It's a fair question. Daft.ie has dominant market share, brand recognition, a large team, and a listings database built over two decades. If I'm building something in the same space, I need a reason that goes beyond "I'll make it better." Better at what? By whose measure? And what's stopping Daft from just adding that feature?&lt;/p&gt;

&lt;p&gt;Here's how I actually thought about it, and what the technical implications turned out to be.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Daft does that I can't replicate
&lt;/h2&gt;

&lt;p&gt;Before anything else: what Daft has that I don't have is supply-side lock-in. Landlords and letting agencies list on Daft because that's where renters look. Renters look on Daft because that's where landlords list. This is a genuine two-sided network effect that took years to build and can't be engineered around.&lt;/p&gt;

&lt;p&gt;I'm not going to out-list Daft. My product doesn't have a listings CMS, a landlord login, or a monetized ad product. I don't want those things. That's not the game I'm playing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The gap that aggregation addresses
&lt;/h2&gt;

&lt;p&gt;Daft has a lot of listings. It doesn't have all of them.&lt;/p&gt;

&lt;p&gt;The Dublin rental market has listings spread across Rent.ie, MyHome, smaller agency portals, property management company websites, and secondary platforms. Some letting agencies and landlords don't post to Daft at all. Others post there but also post to their own site with photos or descriptions that differ slightly.&lt;/p&gt;

&lt;p&gt;When I was looking for a flat myself, I was checking six separate sites manually. That's the gap: not better listings, but a unified view of the listings that already exist.&lt;/p&gt;

&lt;p&gt;The question then becomes: can I build a technical product that's genuinely better at aggregation than a user doing manual searches? And is that worth doing?&lt;/p&gt;

&lt;h2&gt;
  
  
  The aggregation architecture
&lt;/h2&gt;

&lt;p&gt;The core is a crawler that runs on a schedule, pulling listing data from Daft.ie and Rent.ie. The challenges that weren't obvious until I was in it:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Source heterogeneity.&lt;/strong&gt; Each source has its own structure. Daft has a clean API-like interface. Small letting agency sites are often hand-built with inconsistent HTML. Property management company sites sometimes generate listings dynamically in JavaScript, which complicates standard scraping. You end up with a per-source adapter layer that handles idiosyncrasies, feeding a shared normalization layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deduplication.&lt;/strong&gt; The same apartment often appears on three or four sources simultaneously. Without deduplication, a user sees the same property four times and thinks the market has more supply than it does. Deduplication based on address alone doesn't work reliably because addresses aren't formatted consistently. I use a combination of address fuzzy matching, price comparison, and image fingerprinting (when photos are available) to group duplicates. It's not perfect. False positives (merging two distinct listings) are worse than false negatives (showing a duplicate), so I tune conservative.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Change detection.&lt;/strong&gt; Knowing that a listing is new requires knowing what was there before. Knowing that a listing is gone requires distinguishing "temporarily de-listed" from "taken." I keep a snapshot of each crawl and diff it against the previous one. Listings that are absent for two consecutive crawl cycles get marked as likely gone rather than immediately. This reduces false "this listing was taken" alerts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Price normalization.&lt;/strong&gt; Some sources list weekly prices, some monthly. Some include bills, some don't. All prices in the system get converted to monthly EUR before storage. This sounds trivial and took an embarrassing amount of time to get right across all sources.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why not just build on top of Daft's data?
&lt;/h2&gt;

&lt;p&gt;The obvious question. Daft has terms of service that prohibit scraping. More practically, building your entire product on one source you don't control creates a single point of failure that a cease-and-desist, a terms change, or a UI redesign can eliminate overnight.&lt;/p&gt;

&lt;p&gt;The multi-source architecture is a risk hedge as much as a product feature. Any single source can break or become unavailable. Crawlers fall behind when sites update their structure. Source diversity means no individual source going offline kills the product.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this actually competes with Daft
&lt;/h2&gt;

&lt;p&gt;It doesn't compete on listing volume for properties that are Daft-exclusive. It competes on:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross-source coverage.&lt;/strong&gt; If a letting agency posts a property on their own site and nowhere else, my system finds it. Daft doesn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Search.&lt;/strong&gt; Daft's search is filter-based: price range, beds, area from a dropdown. My search accepts natural language and handles the translation to structured filters. "2-bed near Ranelagh Luas, under 1900, pet-friendly" resolves to a structured query without the user having to manually set each filter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Alerts.&lt;/strong&gt; My alerts cover Daft.ie and Rent.ie simultaneously. A user doesn't have to set up and maintain separate alert logic on both portals.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Features Daft hasn't prioritized.&lt;/strong&gt; AI lease review, application tracking, email composer. Features that are useful for the renter's workflow beyond just finding listings.&lt;/p&gt;

&lt;p&gt;None of these make me "better than Daft" on Daft's terms. On the specific things I'm competing on, the product is genuinely different.&lt;/p&gt;

&lt;h2&gt;
  
  
  The strategic frame
&lt;/h2&gt;

&lt;p&gt;The word "competing" is actually a bit wrong here. My product wraps publicly available listing data from Daft.ie and Rent.ie into a faster search and alert workflow. Daft is a source of inventory, not just a competitor. The relationship is asymmetric: they don't know or care I exist; I depend on continued public availability.&lt;/p&gt;

&lt;p&gt;This creates a dependency risk I think about seriously. The mitigation is source diversity, features that add value beyond the listings themselves, and building enough user value that switching away from Daft's data specifically wouldn't kill the product.&lt;/p&gt;

&lt;p&gt;I wrote a longer breakdown of the full comparison between portals and aggregators at &lt;a href="https://homescout.io/guide/better-than-daft-dublin-rentals" rel="noopener noreferrer"&gt;https://homescout.io/guide/better-than-daft-dublin-rentals&lt;/a&gt; if you want the user-facing version of this.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Caspar Bannink. Founder of HomeScout.io. Building AI-powered rental search for Dublin.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devjournal</category>
      <category>product</category>
      <category>sideprojects</category>
      <category>startup</category>
    </item>
    <item>
      <title>Building search features for users in different timezones. The remote renter problem.</title>
      <dc:creator>Caspar Bannink</dc:creator>
      <pubDate>Wed, 27 May 2026 12:24:54 +0000</pubDate>
      <link>https://dev.to/caspar_bannink_3728f095d1/building-search-features-for-users-in-different-timezones-the-remote-renter-problem-olk</link>
      <guid>https://dev.to/caspar_bannink_3728f095d1/building-search-features-for-users-in-different-timezones-the-remote-renter-problem-olk</guid>
      <description>&lt;p&gt;When I looked at who was actually signing up to HomeScout in the early months, the timezone distribution surprised me. A meaningful chunk of users were browsing and setting up alerts from outside Ireland entirely. Netherlands, Germany, Australia, the US. People who needed a Dublin apartment but weren't in Dublin yet.&lt;/p&gt;

&lt;p&gt;That cohort has a fundamentally different set of constraints than a local searcher. Building for them forced some decisions I wouldn't have made otherwise.&lt;/p&gt;

&lt;h2&gt;
  
  
  The core UX problem: latency on a fast-moving market
&lt;/h2&gt;

&lt;p&gt;Dublin rental listings have a short half-life. Something priced fairly in a good area will get 50-100 inquiries in 24 hours and be gone in 48. The search UX that works for a local user (check the app a few times a day, act when something looks good) fails completely for someone who's 8 hours behind or 10 hours ahead.&lt;/p&gt;

&lt;p&gt;The design response to this is: push the user toward a decision earlier. The alert has to fire fast enough and contain enough information to act on immediately, without requiring a second step of "open the app and check."&lt;/p&gt;

&lt;p&gt;Our alerts include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full listing title, address, price&lt;/li&gt;
&lt;li&gt;Bedrooms, BER rating, available date&lt;/li&gt;
&lt;li&gt;Commute time to the user's saved workplace (calculated at send time, not load time)&lt;/li&gt;
&lt;li&gt;Direct link to both the HomeScout detail page and the original source listing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is that a user in a different timezone, woken up by a push notification at 6am their time, can make a go/no-go decision before they're even out of bed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Commute calculation as a core feature, not a filter
&lt;/h2&gt;

&lt;p&gt;Most rental search tools treat commute as a filter you can optionally apply. You draw a radius or specify a max journey time. That's fine if you know the city's transport geography.&lt;/p&gt;

&lt;p&gt;If you've never been to Dublin, you don't know that two neighborhoods that look equally close to city center on a map can have radically different commute times due to transport infrastructure. You don't know that the coastal DART line is reliable and fast, while a bus-dependent area of the same distance might add 30+ minutes to a commute.&lt;/p&gt;

&lt;p&gt;We built commute as a first-class search parameter. You type your workplace, and the UI shows commute time alongside price and size rather than buried in a filter panel. When a user hasn't set a workplace, we prompt them to do it during onboarding rather than after they've already run a search.&lt;/p&gt;

&lt;p&gt;The data layer: we use a combination of routing APIs for public transport journey planning. The challenge is that "commute time" depends on departure time, and users searching from abroad might not know what a realistic Dublin rush-hour departure looks like. We calculate using a standard weekday morning window (7:30-9:00am) and show that assumption explicitly rather than pretending we're giving a precise number.&lt;/p&gt;

&lt;h2&gt;
  
  
  Neighbourhood context without having visited
&lt;/h2&gt;

&lt;p&gt;If I search for apartments in a city I've never been to and the listing says "vibrant area close to all amenities," I learn nothing useful. That phrase appears in listings in both the nicest and least nice streets in Dublin.&lt;/p&gt;

&lt;p&gt;We built neighborhood context cards that try to surface genuinely useful information: the character of the area, the noise profile (especially relevant near nightlife strips), transport options, and rough price positioning relative to comparable areas.&lt;/p&gt;

&lt;p&gt;The hard part is that this information is qualitative and changes. What was "up-and-coming" in 2022 may be settled now, or may still be rough depending on which block you're on. We've had to make editorial choices about how to characterize areas rather than purely relying on scraped data, and those choices need updating.&lt;/p&gt;

&lt;p&gt;One approach we use: cross-referencing listing data with the distribution of listing prices and turnover rates in each area. Areas where listings disappear fast at or above asking price signal high actual demand. Areas with more inventory that sits longer tell a different story. This is less subjective than editorial characterization and updates automatically as new listings flow in.&lt;/p&gt;

&lt;h2&gt;
  
  
  The credibility problem for remote applicants
&lt;/h2&gt;

&lt;p&gt;This one isn't a technical problem, it's a product problem. Landlords sometimes prefer local applicants because they can meet them. A remote applicant asking for special consideration (skip the in-person viewing, sign before arriving) is asking for trust they haven't established.&lt;/p&gt;

&lt;p&gt;The product response is helping users communicate their situation well in their first inquiry. Remote renters who state their arrival date, employment situation, and reason for the remote search upfront get better outcomes. The inquiry email composer we built surfaces a "remote applicant" context option that prompts users to include that information.&lt;/p&gt;

&lt;p&gt;What we don't have yet: any kind of verified identity or background check layer that would let a landlord have more confidence in a remote applicant. That's a real gap. It would probably require partnering with a tenant referencing service, which exists in the Irish market but adds friction that most users in early search stage aren't ready for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Timing features vs timezone detection
&lt;/h2&gt;

&lt;p&gt;We debated auto-detecting user timezone and adjusting UI accordingly (surfacing listings that were just posted vs. listings from hours ago, for example). We haven't built this. The complexity of getting timezone-aware UX right without making it feel weird to local users isn't worth it at our current scale.&lt;/p&gt;

&lt;p&gt;What we did instead: show "listed X hours ago" very prominently on every listing card. This is simple and tells the remote user the one thing they most need to know: is this listing still worth pursuing or has it probably already gone?&lt;/p&gt;

&lt;p&gt;If a listing is 6+ hours old in a competitive area, a remote user should know that their odds of getting a reply are significantly lower. Making that information front-and-center changes their behavior appropriately. They focus on the fresh listings and set up alerts for new ones rather than chasing cold ones.&lt;/p&gt;

&lt;p&gt;I wrote a longer guide on the full remote apartment-hunting process in Dublin here: &lt;a href="https://homescout.io/guide/finding-dublin-apartment-from-another-country" rel="noopener noreferrer"&gt;https://homescout.io/guide/finding-dublin-apartment-from-another-country&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Caspar Bannink. Founder of HomeScout.io. Building AI-powered rental search for Dublin.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>product</category>
      <category>startup</category>
      <category>ux</category>
      <category>webdev</category>
    </item>
    <item>
      <title>State management for real-world workflows: tracking apartment viewings and applications</title>
      <dc:creator>Caspar Bannink</dc:creator>
      <pubDate>Wed, 27 May 2026 12:24:53 +0000</pubDate>
      <link>https://dev.to/caspar_bannink_3728f095d1/state-management-for-real-world-workflows-tracking-apartment-viewings-and-applications-elh</link>
      <guid>https://dev.to/caspar_bannink_3728f095d1/state-management-for-real-world-workflows-tracking-apartment-viewings-and-applications-elh</guid>
      <description>&lt;p&gt;Apartment hunting is a stateful workflow. At any given moment, a user might have 15 listings in various stages: some saved, some inquired about, some with scheduled viewings, some attended, some applied to, some rejected. Each has associated data: emails, notes, documents, times, decisions pending.&lt;/p&gt;

&lt;p&gt;Modeling this well is a product design problem more than a technical one. But the technical decisions you make about state structure shape what the product can actually do.&lt;/p&gt;

&lt;p&gt;This is what I've learned building the application tracker for HomeScout.&lt;/p&gt;

&lt;h2&gt;
  
  
  The state machine
&lt;/h2&gt;

&lt;p&gt;The viewing/application workflow has a natural state machine structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Saved → Inquired → Viewing Scheduled → Attended → Applied → Outcome
                                              ↓                  ↓
                                          Not Interested     Accepted / Rejected
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With additional states like "Rescheduled" (a substate of "Viewing Scheduled") and "Waiting for references" (a substate of "Applied").&lt;/p&gt;

&lt;p&gt;The question is how much of this to expose to the user. Early versions exposed the full pipeline with all substates. Users found it overwhelming. They didn't want to maintain an accurate state machine; they wanted a rough picture of where things stood.&lt;/p&gt;

&lt;p&gt;Current version has seven states: Saved, Contacted, Viewing Scheduled, Viewed, Applied, Offer, Closed (rejected/accepted/withdrawn). That's enough structure to give an overview without requiring micro-updates to every state transition.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data model decisions that matter
&lt;/h2&gt;

&lt;p&gt;Each tracked listing needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reference to the canonical listing (price, location, photos, specs)&lt;/li&gt;
&lt;li&gt;Viewing datetime (nullable, set when scheduled)&lt;/li&gt;
&lt;li&gt;User notes (free text, the most-used field)&lt;/li&gt;
&lt;li&gt;Status enum&lt;/li&gt;
&lt;li&gt;Last-action timestamp (for sorting the list by recency of activity)&lt;/li&gt;
&lt;li&gt;Reminder datetime (nullable, for follow-up nudges)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The canonical listing reference is tricky. Listings on rental portals disappear when let. If a user is tracking an application and the listing goes dead, you need to have captured the listing data rather than holding a reference to a URL that will 404. (Most renters who are tracking 10+ properties simultaneously are also dealing with the speed problem — &lt;a href="https://homescout.io/guide/free-rental-alerts-dublin" rel="noopener noreferrer"&gt;setting up real-time alerts&lt;/a&gt; is what keeps the pipeline from going stale before you can even schedule viewings.)&lt;/p&gt;

&lt;p&gt;We snapshot the listing data at the point of save: title, price, address, description, photos (or at least photo URLs, with the understanding they may eventually 404). This creates storage overhead but is necessary for the tracker to remain useful after a listing closes.&lt;/p&gt;

&lt;p&gt;The alternative (storing only a URL) fails silently in the worst way: the user's tracker looks fine but clicking through gives them nothing, exactly when they're trying to reference details before an interview or application.&lt;/p&gt;

&lt;h2&gt;
  
  
  The calendar view problem
&lt;/h2&gt;

&lt;p&gt;Showing upcoming viewings in a calendar seems straightforward. The complications:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Timezone.&lt;/strong&gt; Users searching from abroad (a meaningful chunk of our user base) may have their device in a different timezone than Dublin. We store viewing datetimes in UTC with Ireland timezone display. We show the timezone explicitly in the viewing confirmation so remote users know what "Thursday 2pm" means in their local time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conflict detection.&lt;/strong&gt; Users sometimes accidentally schedule two viewings at the same time, or with insufficient travel time between them. We flag conflicts (exact time overlap) but don't try to flag "you've booked two viewings 15 minutes apart and they're across the city." That requires knowing travel time between the two viewing addresses, which we could calculate but haven't yet. Simple conflict detection catches the obvious case.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Duration assumption.&lt;/strong&gt; We default to 30 minutes per viewing block in the calendar. Most viewings are shorter, some run longer. We let users edit duration but most don't. The 30-minute default gives enough visual separation in the calendar to make conflicts obvious.&lt;/p&gt;

&lt;h2&gt;
  
  
  Notes: the underrated feature
&lt;/h2&gt;

&lt;p&gt;Post-viewing notes are the feature users engage with most, measured by time spent interacting. They enter them right after a viewing (sometimes before they've left the building) and come back to them when making a decision a week later.&lt;/p&gt;

&lt;p&gt;The design decisions that made this work better:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No formatting.&lt;/strong&gt; Early version had a rich text editor. Users wanted to type fast and didn't want to think about formatting. Plain text input with newlines is what they actually need.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Persistent draft.&lt;/strong&gt; If a user starts typing and closes the app, the draft is saved. Property viewings happen in the real world; people are typing on their phone and getting interrupted. Losing notes because the session closed is the worst outcome for a feature whose value is capturing transient impressions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Timestamps on notes.&lt;/strong&gt; Each note shows when it was added. This matters when a user has multiple notes on a property across different visits or different points in the application process.&lt;/p&gt;

&lt;h2&gt;
  
  
  The reminder problem: where we're at
&lt;/h2&gt;

&lt;p&gt;Follow-up timing is the biggest behavioral lever in rental searching. The window between "attended viewing" and "property offered to someone else" is often 24-48 hours. Users who follow up in that window have much better outcomes than users who follow up on day 3.&lt;/p&gt;

&lt;p&gt;We know this. Our reminder system is still too manual. The right behavior is: status transitions to "Attended + Interested" trigger an automatic 24-hour reminder. We have the data model for reminders. We have the user intent signal (the "Interested" flag). The gap is that we haven't wired the automatic trigger.&lt;/p&gt;

&lt;p&gt;The reason it's not done yet is partly scope and partly that automatic reminders require notification permissions, which add friction to onboarding. We kept onboarding low-friction early on. Notification permission prompts add a step that some users drop off at.&lt;/p&gt;

&lt;p&gt;The resolution is probably to ask for notification permission at the point a user schedules their first viewing, when the value prop ("get reminded about this viewing") is concrete. Not at onboarding when it's abstract. That's the standard pattern for permission request timing and it applies here.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the state machine doesn't model well
&lt;/h2&gt;

&lt;p&gt;Applications are more document-intensive than viewings. For a formal application you might submit a reference letter, an employment letter, 3 months of bank statements, proof of current address, and a photo ID. Tracking which documents you've gathered, which properties you've submitted to, and whether the landlord has confirmed receipt is its own workflow.&lt;/p&gt;

&lt;p&gt;We have basic document tracking (checkboxes for standard document types) but it's not well integrated with the per-property status. This is a real gap. The users who manage applications to multiple properties simultaneously are the ones who most need this, and they're also the users with the most acute pain if they forget to submit something.&lt;/p&gt;

&lt;p&gt;It's also a potentially sensitive feature: these are financial and identity documents. Any feature that involves storing or transmitting them needs to be handled carefully. We currently don't store documents, just track whether you've gathered them. That's the right call for now.&lt;/p&gt;

&lt;p&gt;The full application management problem, including document collection and submission, is probably a separate product surface from viewing management. They're related workflows but distinct enough that bundling them into one view creates cognitive load.&lt;/p&gt;

&lt;p&gt;I mapped out the whole viewing logistics problem from the user side here: &lt;a href="https://homescout.io/guide/managing-rental-viewings-dublin-without-losing-mind" rel="noopener noreferrer"&gt;https://homescout.io/guide/managing-rental-viewings-dublin-without-losing-mind&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Caspar Bannink. Founder of HomeScout.io. Building AI-powered rental search for Dublin.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>product</category>
      <category>softwareengineering</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Why Landlords Never Reply</title>
      <dc:creator>Caspar Bannink</dc:creator>
      <pubDate>Mon, 18 May 2026 21:40:11 +0000</pubDate>
      <link>https://dev.to/caspar_bannink_3728f095d1/why-landlords-never-reply-1mka</link>
      <guid>https://dev.to/caspar_bannink_3728f095d1/why-landlords-never-reply-1mka</guid>
      <description>&lt;p&gt;The Dublin rental market has a response rate problem. The average renter sends a generic inquiry to 20+ properties and hears back from maybe 2 or 3. I measured this through user interviews and it tracks with what academic research on UK/Irish rental markets has found: response rates for cold rental inquiries sit somewhere between 12% and 20% depending on the area and season.&lt;/p&gt;

&lt;p&gt;This is a signal quality problem wrapped in a UX problem. And it's exactly the kind of thing you can build against.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the data actually looks like
&lt;/h2&gt;

&lt;p&gt;When a listing goes live on Daft.ie (the dominant Irish rental portal), it typically receives 50-150 inquiries in the first 48 hours. There's no landlord-side inbox tooling beyond email forwarding. No status tags, no filtering, no templates for responses.&lt;/p&gt;

&lt;p&gt;The landlord opens it, skims a subset, replies to whoever seems credible and convenient, and moves on. The inquiry that lands in position #78 in their inbox could be from a perfect tenant. They'll never see it.&lt;/p&gt;

&lt;p&gt;On the applicant side, the constraint is symmetric. Renters know supply is short and competition is high. Rational strategy is to apply wide. So they send the same message to 20 places. Landlords recognize this pattern and deprioritize anything that sounds like a blast email. The market equilibrium is low signal on both sides.&lt;/p&gt;

&lt;h2&gt;
  
  
  The technical problem: generic text is detectable
&lt;/h2&gt;

&lt;p&gt;From a signal processing standpoint, a landlord reading 150 emails is doing a classification task. "Does this person seem like they actually want THIS property, or are they spamming everything?" Generic emails fail that classifier.&lt;/p&gt;

&lt;p&gt;What passes the classifier:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mentions something specific from the listing (the garden, the commute distance to a named workplace, the pet policy)&lt;/li&gt;
&lt;li&gt;Answers the implicit screening questions (move-in date, lease length preference, employment type)&lt;/li&gt;
&lt;li&gt;Has a coherent reason for wanting that specific location&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Writing this for each of 20 properties manually is O(n) effort. Most people won't do it. That's the gap to build against.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we built and how it works
&lt;/h2&gt;

&lt;p&gt;HomeScout has an AI email composer that generates property-specific inquiry drafts. The flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User finds a listing they want to apply to&lt;/li&gt;
&lt;li&gt;They click "Generate inquiry"&lt;/li&gt;
&lt;li&gt;The system reads the listing data (title, description, location, price, landlord-specified requirements if any)&lt;/li&gt;
&lt;li&gt;It generates a draft that references specifics from the listing and includes answers to the standard screening questions&lt;/li&gt;
&lt;li&gt;User reviews and edits before sending&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A few things we got wrong in early versions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Too formal.&lt;/strong&gt; First drafts sounded like legal correspondence. Real inquiry emails are conversational. People don't write "I would like to express my interest in the aforementioned property." They write "I'd love to come view this, I work nearby and the commute would be ideal."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hallucinating specifics.&lt;/strong&gt; Early versions made up details that weren't in the listing. "I noticed the property has a south-facing garden" when the listing said nothing about the garden's orientation. That's a problem when the landlord reads it. We added a strict constraint: only reference what's explicitly in the listing data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Missing the implicit screening questions.&lt;/strong&gt; The questions landlords actually care about (when can you move, how long do you want, are you employed) aren't always in the listing. We built a user profile layer so those answers get included from the renter's saved preferences rather than being generated per-inquiry.&lt;/p&gt;

&lt;h2&gt;
  
  
  The model choice question
&lt;/h2&gt;

&lt;p&gt;This is one of those tasks where the quality delta between models matters more than the cost delta. The difference between a mediocre and a good inquiry email is meaningful, and the output is user-facing in a context where the stakes are real (someone might get or not get a viewing based on this).&lt;/p&gt;

&lt;p&gt;We use GPT-4o for this feature. We tested smaller models. The failure modes were too common: hallucinated specifics, wrong tone, missing context from the listing. 4o is measurably better on this specific task. The cost is acceptable given the feature's value.&lt;/p&gt;

&lt;p&gt;We also considered fine-tuning on successful inquiry emails, but our dataset isn't large enough to make that worthwhile yet. It's on the roadmap once we have more throughput.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prompt engineering notes
&lt;/h2&gt;

&lt;p&gt;The prompt structure that works best for this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;System prompt establishes the persona (renter, not an AI assistant) and the tone (conversational, specific, not corporate)&lt;/li&gt;
&lt;li&gt;Listing data is passed as structured context, not raw HTML scrape&lt;/li&gt;
&lt;li&gt;User profile answers are included as a short bullet list for the model to draw from&lt;/li&gt;
&lt;li&gt;Explicit negative instructions: don't fabricate specifics, don't use formal register, don't mention that this is AI-generated&lt;/li&gt;
&lt;li&gt;Output format: plain text, no subject line, 3-4 short paragraphs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The negative instructions matter more than the positive ones. The default model behavior drifts toward formality and generic language. You have to actively constrain it away from those patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  What comes next
&lt;/h2&gt;

&lt;p&gt;The honest answer is that email composer is a partial fix. The underlying problem is that landlord inboxes have no structure, so even a good inquiry can get lost. The complete solution involves building tools on the landlord side too: structured applicant pipelines, automated first-response, viewing scheduler. That's a different product scope.&lt;/p&gt;

&lt;p&gt;For now, improving inquiry quality is the lever we can pull on the renter side. And it does move the needle. Users who use the composer report better outcomes than users who don't.&lt;/p&gt;

&lt;p&gt;I wrote a longer breakdown of the full response rate problem and its market context here: &lt;a href="https://homescout.io/guide/why-landlords-never-reply-dublin-rental" rel="noopener noreferrer"&gt;https://homescout.io/guide/why-landlords-never-reply-dublin-rental&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Caspar Bannink. Founder of HomeScout.io. Building AI-powered rental search for Dublin.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>dublin</category>
      <category>renting</category>
      <category>ireland</category>
      <category>proptech</category>
    </item>
    <item>
      <title>Architecture of a Rental Aggregator: Scraping and Normalizing 90+ Sources</title>
      <dc:creator>Caspar Bannink</dc:creator>
      <pubDate>Mon, 18 May 2026 21:39:55 +0000</pubDate>
      <link>https://dev.to/caspar_bannink_3728f095d1/architecture-of-a-rental-aggregator-scraping-and-normalizing-90-sources-1jlk</link>
      <guid>https://dev.to/caspar_bannink_3728f095d1/architecture-of-a-rental-aggregator-scraping-and-normalizing-90-sources-1jlk</guid>
      <description>&lt;p&gt;Building a rental aggregator for Dublin means pulling data from a fragmented market: one dominant portal, a handful of mid-tier sites, dozens of letting agency websites, property management company portals, and a long tail of small sources. Here's how the system is structured and where the interesting problems are.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why existing portals aren't enough
&lt;/h2&gt;

&lt;p&gt;The standard answer for finding Dublin rentals is Daft.ie. It has good coverage and solid search. The problem is that coverage isn't complete. Letting agencies list exclusively on their own websites. Some landlords use smaller portals. A non-trivial share of listings never hits Daft at all.&lt;/p&gt;

&lt;p&gt;If you're only searching Daft, you're seeing maybe 60-70% of what's available. For a renter in a tight market, that gap matters.&lt;/p&gt;

&lt;p&gt;An aggregator's value proposition is simple: search everything, show results in one place. The technical challenge is that "everything" means 90+ sources with no shared API, no standard format, and varying levels of scraping difficulty.&lt;/p&gt;

&lt;h2&gt;
  
  
  Source taxonomy
&lt;/h2&gt;

&lt;p&gt;I classify sources into tiers based on scraping approach:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 1: Structured APIs or feeds.&lt;/strong&gt; A small number of sources expose RSS feeds or have semi-public JSON endpoints. These are easy. Pull, parse, done.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 2: Consistent HTML structure.&lt;/strong&gt; Most mid-tier portals have stable enough HTML that a straightforward scraper works reliably. BeautifulSoup or Playwright depending on whether the content is server-rendered or JS-generated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 3: Letting agency sites.&lt;/strong&gt; These are the hardest. Each one is different. Some run on estate agent CMSes (Property Hive, Agentbox, PropertyBase), which gives you a consistent structure within a platform family. Others are custom-built or use generic CMS platforms with property listings bolted on. I maintain source-specific extractors for each one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 4: Social and informal.&lt;/strong&gt; Facebook Marketplace, some WhatsApp community channels that get scraped via publicly accessible links. Lower data quality, higher volume of noise.&lt;/p&gt;

&lt;h2&gt;
  
  
  The scraping layer
&lt;/h2&gt;

&lt;p&gt;Each source has a scraper that handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Discovery:&lt;/strong&gt; Finding listing URLs. Pagination, sitemap traversal, or feed parsing depending on source type.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extraction:&lt;/strong&gt; Pulling raw data from a listing page. Structured fields where available, falling back to HTML parsing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Change detection:&lt;/strong&gt; Tracking whether a listing we've already seen has changed (price update, status change, taken down).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For JS-heavy sources I use Playwright with a headless Chromium instance. The overhead is significant compared to simple HTTP requests, so I only use it where necessary. Most letting agency sites are server-rendered and much cheaper to scrape.&lt;/p&gt;

&lt;p&gt;Rate limiting is handled per source. I track request timestamps per domain and enforce minimum intervals. The last thing you want is to get blocked from a source because you hammered it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RateLimiter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;requests_per_minute&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;min_interval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;60.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;requests_per_minute&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last_request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;last&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last_request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;last&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;elapsed&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;min_interval&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;min_interval&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;elapsed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last_request&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The normalization layer
&lt;/h2&gt;

&lt;p&gt;Every extractor produces a raw record. The normalization pipeline converts raw records to canonical form. I covered this in detail in the data normalization post, but the key fields are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Price: always monthly EUR, with a qualifier (exact/from/on_application)&lt;/li&gt;
&lt;li&gt;Bedrooms: integer, separated from "box room" counts&lt;/li&gt;
&lt;li&gt;Location: normalized to neighborhood + postal district + lat/lng&lt;/li&gt;
&lt;li&gt;Pet policy: boolean or null (not inferred when absent)&lt;/li&gt;
&lt;li&gt;Available date: ISO 8601 or null&lt;/li&gt;
&lt;li&gt;Source provenance: which extractor, when fetched, source URL&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The normalization step is where most of the bugs live. Edge cases in source formatting surface constantly. A source starts using a new price format. A letting agency relaunches their website with different HTML structure. I run normalization in a separate pass from extraction so I can reprocess raw records when normalization logic changes without re-scraping.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deduplication
&lt;/h2&gt;

&lt;p&gt;Cross-source deduplication is essential. The same property often appears on multiple sources. Without deduplication, a user would see the same listing three times with slightly different data.&lt;/p&gt;

&lt;p&gt;The deduplication approach is blocking plus similarity scoring:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Block by price band (+/- 10%) and geographic area (same district or adjacent)&lt;/li&gt;
&lt;li&gt;Within blocks, score similarity on address string, price, beds, baths, available date&lt;/li&gt;
&lt;li&gt;Pairs above the threshold get merged. The canonical record keeps the richest data from all matching sources.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This runs as a batch job after normalization completes for a crawl cycle.&lt;/p&gt;

&lt;h2&gt;
  
  
  Storage and freshness
&lt;/h2&gt;

&lt;p&gt;Listings go into Postgres. I keep full history: every time a listing changes, the previous state is preserved with a timestamp. This lets me track price changes over time, which surfaces useful signals (listings that drop in price are often still available and motivated to rent).&lt;/p&gt;

&lt;p&gt;Each listing has a &lt;code&gt;freshness_score&lt;/code&gt; that decays over time. The score starts high when a listing is first seen or updated, and drops on a schedule tuned to how frequently that source typically updates. Stale listings get surfaced to users with a staleness label rather than hidden completely, because they might still be available.&lt;/p&gt;

&lt;p&gt;The crawl cycle runs on a schedule per source. High-value sources (the main portals) run every 15-30 minutes. Long-tail sources run every few hours. The full catalog is refreshed within a 24-hour window at minimum.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this breaks down
&lt;/h2&gt;

&lt;p&gt;A few honest failure modes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Listings that go fast.&lt;/strong&gt; A listing can be posted and let within hours. If a source is on a 4-hour crawl cycle, the user might never see it. For the highest-priority sources I push the crawl frequency as high as the site tolerates. But there's no fix for a source that updates infrequently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data accuracy.&lt;/strong&gt; Scraped data is only as accurate as the source. Listings sometimes have wrong prices, wrong bed counts, or outdated availability dates. There's no reliable way to independently verify these without viewings. I surface source data as-is with the provenance visible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CAPTCHAs and bot detection.&lt;/strong&gt; Some sources actively block scraping. I don't fight these. If a source blocks me, I either find a publicly accessible feed or exclude that source.&lt;/p&gt;

&lt;p&gt;I wrote a more user-facing view of how the aggregation works at &lt;a href="https://homescout.io/guide/tools-find-apartment-dublin" rel="noopener noreferrer"&gt;https://homescout.io/guide/tools-find-apartment-dublin&lt;/a&gt;. This post is the technical layer underneath that.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Caspar Bannink. Founder of HomeScout.io. Building AI-powered rental search for Dublin.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>dublin</category>
      <category>renting</category>
      <category>ireland</category>
      <category>proptech</category>
    </item>
    <item>
      <title>Managing Rental Viewings</title>
      <dc:creator>Caspar Bannink</dc:creator>
      <pubDate>Mon, 18 May 2026 21:39:38 +0000</pubDate>
      <link>https://dev.to/caspar_bannink_3728f095d1/managing-rental-viewings-n2i</link>
      <guid>https://dev.to/caspar_bannink_3728f095d1/managing-rental-viewings-n2i</guid>
      <description>&lt;p&gt;Apartment hunting is a stateful workflow. At any given moment, a user might have 15 listings in various stages: some saved, some inquired about, some with scheduled viewings, some attended, some applied to, some rejected. Each has associated data: emails, notes, documents, times, decisions pending.&lt;/p&gt;

&lt;p&gt;Modeling this well is a product design problem more than a technical one. But the technical decisions you make about state structure shape what the product can actually do.&lt;/p&gt;

&lt;p&gt;This is what I've learned building the application tracker for HomeScout.&lt;/p&gt;

&lt;h2&gt;
  
  
  The state machine
&lt;/h2&gt;

&lt;p&gt;The viewing/application workflow has a natural state machine structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Saved → Inquired → Viewing Scheduled → Attended → Applied → Outcome
                                              ↓                  ↓
                                          Not Interested     Accepted / Rejected
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With additional states like "Rescheduled" (a substate of "Viewing Scheduled") and "Waiting for references" (a substate of "Applied").&lt;/p&gt;

&lt;p&gt;The question is how much of this to expose to the user. Early versions exposed the full pipeline with all substates. Users found it overwhelming. They didn't want to maintain an accurate state machine; they wanted a rough picture of where things stood.&lt;/p&gt;

&lt;p&gt;Current version has seven states: Saved, Contacted, Viewing Scheduled, Viewed, Applied, Offer, Closed (rejected/accepted/withdrawn). That's enough structure to give an overview without requiring micro-updates to every state transition.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data model decisions that matter
&lt;/h2&gt;

&lt;p&gt;Each tracked listing needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reference to the canonical listing (price, location, photos, specs)&lt;/li&gt;
&lt;li&gt;Viewing datetime (nullable, set when scheduled)&lt;/li&gt;
&lt;li&gt;User notes (free text, the most-used field)&lt;/li&gt;
&lt;li&gt;Status enum&lt;/li&gt;
&lt;li&gt;Last-action timestamp (for sorting the list by recency of activity)&lt;/li&gt;
&lt;li&gt;Reminder datetime (nullable, for follow-up nudges)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The canonical listing reference is tricky. Listings on rental portals disappear when let. If a user is tracking an application and the listing goes dead, you need to have captured the listing data rather than holding a reference to a URL that will 404. (Most renters who are tracking 10+ properties simultaneously are also dealing with the speed problem — &lt;a href="https://homescout.io/guide/free-rental-alerts-dublin" rel="noopener noreferrer"&gt;setting up real-time alerts&lt;/a&gt; is what keeps the pipeline from going stale before you can even schedule viewings.)&lt;/p&gt;

&lt;p&gt;We snapshot the listing data at the point of save: title, price, address, description, photos (or at least photo URLs, with the understanding they may eventually 404). This creates storage overhead but is necessary for the tracker to remain useful after a listing closes.&lt;/p&gt;

&lt;p&gt;The alternative (storing only a URL) fails silently in the worst way: the user's tracker looks fine but clicking through gives them nothing, exactly when they're trying to reference details before an interview or application.&lt;/p&gt;

&lt;h2&gt;
  
  
  The calendar view problem
&lt;/h2&gt;

&lt;p&gt;Showing upcoming viewings in a calendar seems straightforward. The complications:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Timezone.&lt;/strong&gt; Users searching from abroad (a meaningful chunk of our user base) may have their device in a different timezone than Dublin. We store viewing datetimes in UTC with Ireland timezone display. We show the timezone explicitly in the viewing confirmation so remote users know what "Thursday 2pm" means in their local time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conflict detection.&lt;/strong&gt; Users sometimes accidentally schedule two viewings at the same time, or with insufficient travel time between them. We flag conflicts (exact time overlap) but don't try to flag "you've booked two viewings 15 minutes apart and they're across the city." That requires knowing travel time between the two viewing addresses, which we could calculate but haven't yet. Simple conflict detection catches the obvious case.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Duration assumption.&lt;/strong&gt; We default to 30 minutes per viewing block in the calendar. Most viewings are shorter, some run longer. We let users edit duration but most don't. The 30-minute default gives enough visual separation in the calendar to make conflicts obvious.&lt;/p&gt;

&lt;h2&gt;
  
  
  Notes: the underrated feature
&lt;/h2&gt;

&lt;p&gt;Post-viewing notes are the feature users engage with most, measured by time spent interacting. They enter them right after a viewing (sometimes before they've left the building) and come back to them when making a decision a week later.&lt;/p&gt;

&lt;p&gt;The design decisions that made this work better:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No formatting.&lt;/strong&gt; Early version had a rich text editor. Users wanted to type fast and didn't want to think about formatting. Plain text input with newlines is what they actually need.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Persistent draft.&lt;/strong&gt; If a user starts typing and closes the app, the draft is saved. Property viewings happen in the real world; people are typing on their phone and getting interrupted. Losing notes because the session closed is the worst outcome for a feature whose value is capturing transient impressions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Timestamps on notes.&lt;/strong&gt; Each note shows when it was added. This matters when a user has multiple notes on a property across different visits or different points in the application process.&lt;/p&gt;

&lt;h2&gt;
  
  
  The reminder problem: where we're at
&lt;/h2&gt;

&lt;p&gt;Follow-up timing is the biggest behavioral lever in rental searching. The window between "attended viewing" and "property offered to someone else" is often 24-48 hours. Users who follow up in that window have much better outcomes than users who follow up on day 3.&lt;/p&gt;

&lt;p&gt;We know this. Our reminder system is still too manual. The right behavior is: status transitions to "Attended + Interested" trigger an automatic 24-hour reminder. We have the data model for reminders. We have the user intent signal (the "Interested" flag). The gap is that we haven't wired the automatic trigger.&lt;/p&gt;

&lt;p&gt;The reason it's not done yet is partly scope and partly that automatic reminders require notification permissions, which add friction to onboarding. We optimized onboarding for low friction early on. Notification permission prompts add a step that some users drop off at.&lt;/p&gt;

&lt;p&gt;The resolution is probably to ask for notification permission at the point a user schedules their first viewing, when the value prop ("get reminded about this viewing") is concrete. Not at onboarding when it's abstract. That's the standard pattern for permission request timing and it applies here.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the state machine doesn't model well
&lt;/h2&gt;

&lt;p&gt;Applications are more document-intensive than viewings. For a formal application you might submit a reference letter, an employment letter, 3 months of bank statements, proof of current address, and a photo ID. Tracking which documents you've gathered, which properties you've submitted to, and whether the landlord has confirmed receipt is its own workflow.&lt;/p&gt;

&lt;p&gt;We have basic document tracking (checkboxes for standard document types) but it's not well integrated with the per-property status. This is a real gap. The users who manage applications to multiple properties simultaneously are the ones who most need this, and they're also the users with the most acute pain if they forget to submit something.&lt;/p&gt;

&lt;p&gt;It's also a potentially sensitive feature: these are financial and identity documents. Any feature that involves storing or transmitting them needs to be handled carefully. We currently don't store documents, just track whether you've gathered them. That's the right call for now.&lt;/p&gt;

&lt;p&gt;The full application management problem, including document collection and submission, is probably a separate product surface from viewing management. They're related workflows but distinct enough that bundling them into one view creates cognitive load.&lt;/p&gt;

&lt;p&gt;I mapped out the whole viewing logistics problem from the user side here: &lt;a href="https://homescout.io/guide/managing-rental-viewings-dublin-without-losing-mind" rel="noopener noreferrer"&gt;https://homescout.io/guide/managing-rental-viewings-dublin-without-losing-mind&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Caspar Bannink. Founder of HomeScout.io. Building AI-powered rental search for Dublin.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>dublin</category>
      <category>renting</category>
      <category>ireland</category>
      <category>proptech</category>
    </item>
    <item>
      <title>Building a Real-Time Listing Alert System: Polling, Webhooks, and Monitoring 90+ Sites</title>
      <dc:creator>Caspar Bannink</dc:creator>
      <pubDate>Mon, 18 May 2026 21:39:19 +0000</pubDate>
      <link>https://dev.to/caspar_bannink_3728f095d1/building-a-real-time-listing-alert-system-polling-webhooks-and-monitoring-90-sites-2l6i</link>
      <guid>https://dev.to/caspar_bannink_3728f095d1/building-a-real-time-listing-alert-system-polling-webhooks-and-monitoring-90-sites-2l6i</guid>
      <description>&lt;p&gt;The alert system is the most user-critical feature in a rental aggregator. Users care most about being notified quickly when something matches their criteria. Here's how I built it, what I evaluated first, and where the interesting tradeoffs are.&lt;/p&gt;

&lt;h2&gt;
  
  
  The options: webhooks, RSS, polling
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Webhooks&lt;/strong&gt; would be ideal. Source site posts a listing, fires a webhook to your system, you process and alert within seconds. Zero wasted requests, minimal latency.&lt;/p&gt;

&lt;p&gt;Reality: almost no rental portals expose webhooks. This isn't a technical limitation on their end. They just haven't built it, and in many cases scraping is technically prohibited by their ToS. The webhook path is effectively unavailable for most sources.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RSS feeds&lt;/strong&gt; are available from a small number of sources. Daft used to have them. A few smaller sites still publish them. Where they exist they're great: structured, cacheable, low overhead. But coverage is limited.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Polling&lt;/strong&gt; is what you actually use. You hit each source on a schedule, parse the results, diff against what you've already seen, and trigger alerts for new listings. It's the slowest and most resource-intensive option, but it's the one that works across all sources.&lt;/p&gt;

&lt;p&gt;The system design challenge with polling is making it fast enough to be useful without hammering sources.&lt;/p&gt;

&lt;h2&gt;
  
  
  The polling architecture
&lt;/h2&gt;

&lt;p&gt;Each source has a crawler that runs on a schedule. Crawlers are classified by priority:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;High-priority sources&lt;/strong&gt; (main portals, high listing volume): crawl every 15-30 minutes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Medium-priority sources&lt;/strong&gt; (secondary portals, mid-tier agencies): crawl every 1-2 hours&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Low-priority sources&lt;/strong&gt; (small agencies, long-tail sites): crawl every 4-12 hours&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Within a crawl, the crawler fetches the listing index (search results page or feed), extracts listing IDs and basic metadata, and compares against the stored state. Only changed or new listings trigger a full extraction. This keeps full-page requests proportional to the change rate, not the crawl rate.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;crawl_source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source_config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SourceConfig&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;RawListing&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="c1"&gt;# Fetch listing index (search results or feed)
&lt;/span&gt;    &lt;span class="n"&gt;index_items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Load stored state for this source
&lt;/span&gt;    &lt;span class="n"&gt;stored&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_listing_ids&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;source_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;stored_set&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;external_id&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stored&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;new_ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;external_id&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;index_items&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;stored_set&lt;/span&gt;

    &lt;span class="c1"&gt;# Only fetch full listing pages for new listings
&lt;/span&gt;    &lt;span class="n"&gt;new_listings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;index_items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;external_id&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;new_ids&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;listing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch_full_listing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;new_listings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;listing&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;new_listings&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach means most crawl cycles result in zero full-page fetches. The overhead scales with the listing change rate of the source, not the total inventory size.&lt;/p&gt;

&lt;h2&gt;
  
  
  Diffing and change detection
&lt;/h2&gt;

&lt;p&gt;When a listing is fetched, I store a content hash alongside the structured data. On subsequent crawls:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Same hash: no change, skip&lt;/li&gt;
&lt;li&gt;Different hash: fetch full listing, update stored record, check if the change affects any active alerts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Price changes get special handling. A listing that drops in price might not be "new" but it becomes relevant to users who were previously priced out. The alert system checks price-sensitive alerts when price changes are detected, not just on new listings.&lt;/p&gt;

&lt;p&gt;Listings that disappear from the index are marked as potentially taken. I don't immediately remove them from user views because sources sometimes temporarily de-list listings without them being actually let. I wait for two consecutive crawl cycles where the listing is absent before marking it as gone and stopping alerts on it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The alert matching engine
&lt;/h2&gt;

&lt;p&gt;Users set up saved searches with typed criteria: price range, beds, area, transport proximity, pet policy, and optionally a freetext description preference.&lt;/p&gt;

&lt;p&gt;When new listings come in, each one is scored against all active saved searches. The matching is two-stage:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Structured filter check&lt;/strong&gt; (hard pass/fail): does the listing meet the hard criteria (price, beds, area)? If not, stop. This is fast and handles most rejections.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Soft preference scoring&lt;/strong&gt; (for listings that pass the filter): score the listing against any freetext preferences using the embedding similarity approach described in the search article. Listings above a similarity threshold trigger an alert. Listings below it go into a "possible match" digest rather than an instant alert.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This prevents freetext preferences from flooding users with weak matches while still surfacing them at lower priority.&lt;/p&gt;

&lt;h2&gt;
  
  
  Delivery and deduplication
&lt;/h2&gt;

&lt;p&gt;Alerts go via email by default. The delivery layer handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Per-user rate limiting:&lt;/strong&gt; a user with five saved searches shouldn't get fifteen emails in a minute if fifteen listings come in. I batch alerts within a short window and send a digest.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-search deduplication:&lt;/strong&gt; the same listing can match multiple saved searches for the same user. It gets mentioned once in the alert, with a note about which searches it matched.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Platform deduplication:&lt;/strong&gt; the same physical property sometimes appears on multiple sources. After normalization and deduplication, only one record per property exists. Alert matching happens on deduplicated records, so users don't get alerted twice for the same apartment.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Where the latency actually lives
&lt;/h2&gt;

&lt;p&gt;The end-to-end latency from a listing being posted to a user getting an alert is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Time until next crawl of that source (0-30 minutes for high-priority sources)&lt;/li&gt;
&lt;li&gt;Extraction and normalization time (~seconds)&lt;/li&gt;
&lt;li&gt;Alert matching time (~seconds for the active alert set)&lt;/li&gt;
&lt;li&gt;Email delivery (~seconds to minutes depending on provider)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For high-priority sources the realistic alert latency is 5-35 minutes. That's competitive with Daft's own alert system, and it covers sources Daft doesn't.&lt;/p&gt;

&lt;p&gt;The gap I haven't solved: if a listing goes live and gets taken at 2:00am while the next crawl is at 2:20am, users will get an alert for something that's already gone. This is a fundamental limitation of polling-based systems. I note it clearly in the product rather than pretending it doesn't happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Infrastructure
&lt;/h2&gt;

&lt;p&gt;The crawl jobs run as async workers. I use a task queue (Celery with Redis as broker) to schedule crawls, with priority queues for high-priority sources. The queue approach means I can add crawl workers horizontally when source count grows.&lt;/p&gt;

&lt;p&gt;Postgres handles all storage. The listings table has a GIN index on the JSONB amenities column, and standard B-tree indexes on price, beds, area, and source. Alert matching queries run in under 100ms for the current dataset size.&lt;/p&gt;

&lt;p&gt;I wrote a user-facing version of how alerts work at &lt;a href="https://homescout.io/guide/free-rental-alerts-dublin" rel="noopener noreferrer"&gt;https://homescout.io/guide/free-rental-alerts-dublin&lt;/a&gt;. This post is the internals behind it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Caspar Bannink. Founder of HomeScout.io. Building AI-powered rental search for Dublin.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>dublin</category>
      <category>renting</category>
      <category>ireland</category>
      <category>proptech</category>
    </item>
    <item>
      <title>Finding Apartment From Abroad</title>
      <dc:creator>Caspar Bannink</dc:creator>
      <pubDate>Mon, 18 May 2026 21:39:03 +0000</pubDate>
      <link>https://dev.to/caspar_bannink_3728f095d1/finding-apartment-from-abroad-5081</link>
      <guid>https://dev.to/caspar_bannink_3728f095d1/finding-apartment-from-abroad-5081</guid>
      <description>&lt;p&gt;When I looked at who was actually signing up to HomeScout in the early months, the timezone distribution surprised me. A meaningful chunk of users were browsing and setting up alerts from outside Ireland entirely. Netherlands, Germany, Australia, the US. People who needed a Dublin apartment but weren't in Dublin yet.&lt;/p&gt;

&lt;p&gt;That cohort has a fundamentally different set of constraints than a local searcher. Building for them forced some decisions I wouldn't have made otherwise.&lt;/p&gt;

&lt;h2&gt;
  
  
  The core UX problem: latency on a fast-moving market
&lt;/h2&gt;

&lt;p&gt;Dublin rental listings have a short half-life. Something priced fairly in a good area will get 50-100 inquiries in 24 hours and be gone in 48. The search UX that works for a local user (check the app a few times a day, act when something looks good) fails completely for someone who's 8 hours behind or 10 hours ahead.&lt;/p&gt;

&lt;p&gt;The design response to this is: push the user toward a decision earlier. The alert has to fire fast enough and contain enough information to act on immediately, without requiring a second step of "open the app and check."&lt;/p&gt;

&lt;p&gt;Our alerts include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full listing title, address, price&lt;/li&gt;
&lt;li&gt;Bedrooms, BER rating, available date&lt;/li&gt;
&lt;li&gt;Commute time to the user's saved workplace (calculated at send time, not load time)&lt;/li&gt;
&lt;li&gt;Direct link to both the HomeScout detail page and the original source listing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is that a user in a different timezone, woken up by a push notification at 6am their time, can make a go/no-go decision before they're even out of bed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Commute calculation as a core feature, not a filter
&lt;/h2&gt;

&lt;p&gt;Most rental search tools treat commute as a filter you can optionally apply. You draw a radius or specify a max journey time. That's fine if you know the city's transport geography.&lt;/p&gt;

&lt;p&gt;If you've never been to Dublin, you don't know that two neighborhoods that look equally close to city center on a map can have radically different commute times due to transport infrastructure. You don't know that the coastal DART line is reliable and fast, while a bus-dependent area of the same distance might add 30+ minutes to a commute.&lt;/p&gt;

&lt;p&gt;We built commute as a first-class search parameter. You type your workplace, and the UI shows commute time alongside price and size rather than buried in a filter panel. When a user hasn't set a workplace, we prompt them to do it during onboarding rather than after they've already run a search.&lt;/p&gt;

&lt;p&gt;The data layer: we use a combination of routing APIs for public transport journey planning. The challenge is that "commute time" depends on departure time, and users searching from abroad might not know what a realistic Dublin rush-hour departure looks like. We calculate using a standard weekday morning window (7:30-9:00am) and show that assumption explicitly rather than pretending we're giving a precise number.&lt;/p&gt;

&lt;h2&gt;
  
  
  Neighbourhood context without having visited
&lt;/h2&gt;

&lt;p&gt;If I search for apartments in a city I've never been to and the listing says "vibrant area close to all amenities," I learn nothing useful. That phrase appears in listings in both the nicest and least nice streets in Dublin.&lt;/p&gt;

&lt;p&gt;We built neighborhood context cards that try to surface genuinely useful information: the character of the area, the noise profile (especially relevant near nightlife strips), transport options, and rough price positioning relative to comparable areas.&lt;/p&gt;

&lt;p&gt;The hard part is that this information is qualitative and changes. What was "up-and-coming" in 2022 may be settled now, or may still be rough depending on which block you're on. We've had to make editorial choices about how to characterize areas rather than purely relying on scraped data, and those choices need updating.&lt;/p&gt;

&lt;p&gt;One approach we use: cross-referencing listing data with the distribution of listing prices and turnover rates in each area. Areas where listings disappear fast at or above asking price signal high actual demand. Areas with more inventory that sits longer tell a different story. This is less subjective than editorial characterization and updates automatically as new listings flow in.&lt;/p&gt;

&lt;h2&gt;
  
  
  The credibility problem for remote applicants
&lt;/h2&gt;

&lt;p&gt;This one isn't a technical problem, it's a product problem. Landlords sometimes prefer local applicants because they can meet them. A remote applicant asking for special consideration (skip the in-person viewing, sign before arriving) is asking for trust they haven't established.&lt;/p&gt;

&lt;p&gt;The product response is helping users communicate their situation well in their first inquiry. Remote renters who state their arrival date, employment situation, and reason for the remote search upfront get better outcomes. The inquiry email composer we built surfaces a "remote applicant" context option that prompts users to include that information.&lt;/p&gt;

&lt;p&gt;What we don't have yet: any kind of verified identity or background check layer that would let a landlord have more confidence in a remote applicant. That's a real gap. It would probably require partnering with a tenant referencing service, which exists in the Irish market but adds friction that most users in early search stage aren't ready for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Timing features vs timezone detection
&lt;/h2&gt;

&lt;p&gt;We debated auto-detecting user timezone and adjusting UI accordingly (surfacing listings that were just posted vs. listings from hours ago, for example). We haven't built this. The complexity of getting timezone-aware UX right without making it feel weird to local users isn't worth it at our current scale.&lt;/p&gt;

&lt;p&gt;What we did instead: show "listed X hours ago" very prominently on every listing card. This is simple and tells the remote user the one thing they most need to know: is this listing still worth pursuing or has it probably already gone?&lt;/p&gt;

&lt;p&gt;If a listing is 6+ hours old in a competitive area, a remote user should know that their odds of getting a reply are significantly lower. Making that information front-and-center changes their behavior appropriately. They focus on the fresh listings and set up alerts for new ones rather than chasing cold ones.&lt;/p&gt;

&lt;p&gt;I wrote a longer guide on the full remote apartment-hunting process in Dublin here: &lt;a href="https://homescout.io/guide/finding-dublin-apartment-from-another-country" rel="noopener noreferrer"&gt;https://homescout.io/guide/finding-dublin-apartment-from-another-country&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Caspar Bannink. Founder of HomeScout.io. Building AI-powered rental search for Dublin.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>dublin</category>
      <category>renting</category>
      <category>ireland</category>
      <category>proptech</category>
    </item>
    <item>
      <title>Dont Know Where To Live</title>
      <dc:creator>Caspar Bannink</dc:creator>
      <pubDate>Mon, 18 May 2026 21:38:48 +0000</pubDate>
      <link>https://dev.to/caspar_bannink_3728f095d1/dont-know-where-to-live-508c</link>
      <guid>https://dev.to/caspar_bannink_3728f095d1/dont-know-where-to-live-508c</guid>
      <description>&lt;p&gt;One of the first onboarding questions we tried for HomeScout was: "Which area of Dublin are you looking in?"&lt;/p&gt;

&lt;p&gt;It failed. A large percentage of users either skipped it, answered "anywhere," or picked one area then spent their first session clearly confused about why what they wanted wasn't there.&lt;/p&gt;

&lt;p&gt;The issue was that we'd assumed users had a mental model of Dublin they didn't have. Asking "where in Dublin" when someone has never been to Dublin is roughly like asking "which part of the API are you trying to call" when someone is writing their first web request. The question assumes foundational knowledge that doesn't exist yet.&lt;/p&gt;

&lt;p&gt;Fixing this required rethinking what "finding a neighbourhood" actually means computationally and from a UX standpoint.&lt;/p&gt;

&lt;h2&gt;
  
  
  The actual decision structure
&lt;/h2&gt;

&lt;p&gt;Choosing where to live involves several loosely connected decisions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What areas are reachable given my commute constraint?&lt;/li&gt;
&lt;li&gt;Among those, which have apartments in my price range?&lt;/li&gt;
&lt;li&gt;Among those, which match my lifestyle preferences (noise level, walkability, urban/suburban feel)?&lt;/li&gt;
&lt;li&gt;Among those, which have supply available right now?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most rental search tools treat this as a map filter: draw a shape, see listings inside it. That works if you already know which shape to draw. It fails for new-to-city users because the shape-drawing step requires domain knowledge they don't have.&lt;/p&gt;

&lt;p&gt;The alternative framing: accept commute and price as hard constraints, treat lifestyle preference as a soft ranking signal, and generate a candidate neighborhood list rather than asking the user to specify one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Commute as constraint: the implementation
&lt;/h2&gt;

&lt;p&gt;Commute calculation is the most technically interesting part of this. The naive approach (radius from a point) is wrong for Dublin because the transport network is highly directional. A 5km radius from a workplace in the south docks includes areas with 15-minute DART connections and areas with 50-minute bus rides.&lt;/p&gt;

&lt;p&gt;We calculate commute using routing APIs with a morning peak window (7:30-9:00am departure, weekday). The result is a travel-time isochrone: a set of points reachable from the workplace within a given time limit.&lt;/p&gt;

&lt;p&gt;A few things we learned doing this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mode matters more than distance.&lt;/strong&gt; The DART covers its route fast. Areas on the DART line that look far on a map are closer in real commute terms than areas that look near but require a bus connection. The isochrone algorithm captures this naturally, but it surprised users in early testing who expected proximity on a map to correlate with commute time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The isochrone edge is fuzzy.&lt;/strong&gt; A listing 300 meters outside a "45-minute commute" isochrone might actually be fine depending on departure time variance, walking pace, and platform dwell time. We show commute estimates with a "approximately" qualifier and don't hard-clip results at the edge.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Public transport is time-of-day dependent.&lt;/strong&gt; We calculate for morning peak because that's the binding constraint for most users. Some users commute against peak or work flexible hours. We have a preference for this but don't over-engineer it since most users don't fill out detailed schedule information.&lt;/p&gt;

&lt;h2&gt;
  
  
  Neighborhood characterization: the data problem
&lt;/h2&gt;

&lt;p&gt;Commute filters get you from 90+ possible areas to maybe 8-15. The next step is helping users evaluate those areas without visiting them.&lt;/p&gt;

&lt;p&gt;The data sources for neighborhood characterization that are actually useful:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Listing density and price distribution.&lt;/strong&gt; An area with 50 listings in a price band signals high supply. An area with 3 listings signals either high demand (things disappear fast) or low fit with the price range. We show listing volume alongside price median. For context on which Dublin neighborhoods actually come out cheapest for tech workers specifically, the &lt;a href="https://homescout.io/guide/best-areas-dublin-tech-workers" rel="noopener noreferrer"&gt;best areas for tech workers guide&lt;/a&gt; has the breakdown.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Listing age distribution.&lt;/strong&gt; Areas where listings disappear in under 24 hours have higher real demand than areas where the same listings sit for a week. This isn't available from portals directly, but we can observe it from our scraping cadence. If we see a listing at scrape time T and it's gone at T+24h, we record that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Price trends over scraping history.&lt;/strong&gt; If a neighborhood's median listing price has increased 8% in the last 6 months, that's a signal about demand pressure that a static price display doesn't capture.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Editorial character data.&lt;/strong&gt; Some things can't be inferred from listing data. Noise profile, feel at night, walkability, community character. We wrote these manually for each main neighborhood based on a combination of direct experience, user feedback, and cross-referencing sources like local community boards and Google Maps review patterns. This doesn't scale infinitely but covers the 25-30 neighborhoods that account for the majority of Dublin rental activity.&lt;/p&gt;

&lt;h2&gt;
  
  
  The recommendation UI decision
&lt;/h2&gt;

&lt;p&gt;Early version: we generated a ranked list of neighborhoods and showed it as "recommended for you."&lt;/p&gt;

&lt;p&gt;Users didn't trust it. The main piece of feedback was: "How did you decide this?" The ranking was opaque and felt arbitrary even when it was correct.&lt;/p&gt;

&lt;p&gt;Current version: we show neighborhoods as a filterable grid with the relevant stats visible (median price, commute time, estimated availability, character tags). Users apply their own judgment. The system does the data work; the user does the deciding.&lt;/p&gt;

&lt;p&gt;This is less impressive as a product demo. It doesn't have a "magic recommendation" moment. But retention after first search is better, because users feel like they made an informed decision rather than following a machine.&lt;/p&gt;

&lt;p&gt;The lesson: for decisions with high personal weight (where you live for the next year), users want tools that support their judgment, not replace it. This is a domain where explainable over clever is the right call.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's still wrong
&lt;/h2&gt;

&lt;p&gt;The characterization data goes stale. Dublin neighborhoods change. The manual editorial layer needs periodic updates and we don't have a good automated signal for when a neighborhood's character has shifted enough to warrant a re-write.&lt;/p&gt;

&lt;p&gt;The commute data is only as good as the routing API, which uses GTFS data for public transport. GTFS data for Dublin Bus is notoriously incomplete and sometimes out of date. We supplement it with real journey time validation, but it's a moving target.&lt;/p&gt;

&lt;p&gt;User preference capture is still too shallow. "Quiet area" means different things to different people. We ask a handful of questions but haven't built anything sophisticated enough to actually model lifestyle fit with accuracy. That's probably the next thing worth investing in.&lt;/p&gt;

&lt;p&gt;The renter-facing version of this neighborhood analysis is here: &lt;a href="https://homescout.io/guide/moving-dublin-dont-know-where-to-live" rel="noopener noreferrer"&gt;https://homescout.io/guide/moving-dublin-dont-know-where-to-live&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Caspar Bannink. Founder of HomeScout.io. Building AI-powered rental search for Dublin.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>dublin</category>
      <category>renting</category>
      <category>ireland</category>
      <category>proptech</category>
    </item>
  </channel>
</rss>
