Leave a comment below to introduce yourself! You can talk about what brought you here, what you're learning, or just a fun fact about
yourself.Reply to someone's comment, either with a question or just a hello. 👋
Come back next week to greet our new members so you can one day earn our Warm Welcome Badge!

Top comments (583)
Welcome everyone and thanks for stopping by!
I have been a DEV for half a year now! The community has been really supportive and I am glad you are new on DEV or recently came back on DEV!
If you are getting the most out of DEV, I recommend reading my article on getting started!
Get Started on Dev.to! A Beginner's Guide to Engage with the Community! 💡
I recommend reading other articles from the DEV community as well below:
dev.to/help/community-resources
Thanks for stopping by! Hope your journey goes well on DEV :D
Hello! I'm a beginner Front-End dev and a technology enthusiast with a love for dark aestherics, creativity, and digital art. I'm so excited to join this community, improve my skills, and connect with people who love tech as much as I do. I hope to turn ideas into something beautiful through code. 🖤
Hello!! We are equally on the same boat but i'm fullstack with diversion into AI space.
Hailoooo!!!
Hello to you too
Hello and welcome! It's cool to see your passion for art and aesthetics!
hi
Hello KAT, nice intro, hav a nice day
Hi Kat ;3, welcome to DEV too!!
Hey KAT! Welcome to DEV! :D
Hey KAT! Welcome to DEV! :D
Hey everyone 👋
I'm Sergey, a Full Stack Developer passionate about AI, open
source, and developer tooling.
Currently building with Node.js, Vanilla JS, Express,
Playwright, and AI-assisted workflows through Claude Code 🚀
Excited to learn, share, and connect with the developer
community here 🙌
Hello! I'm a beginner in MERN Stack web development and a student and am Interested in AI/ML. I am excited to join this community and connect with others, improve my skills and knowledge of technologies. I hope to learn, aspire and create something with my knowledge and skills
Thanks for the introduction Mohd! Welcome to DEV and glad you are here! Any projects you are working on?
Hello
hello
Hey Sami! Welcome!
Hi
Welcome Welcome :D
Hello!! I'm a backend Go/Python developer. I care about backend architecture, testing, and keeping application code simple. Glad to be here and looking forward to reading and sharing more. 👋
Welcome to DEV Vlad!
hello
Hi!
Hey everyone 👋
I'm Sergey, a Full Stack Developer passionate about AI, open
source, and developer tooling.
Currently building with PHP, Go, Node.js, Vanilla JS, Express,
Playwright, and AI-assisted workflows through Claude Code 🚀
Portfolio: fighter90.github.io/
Excited to learn, share, and connect with the developer
community here 🙌
Hi! I'm German, solo founder from Buenos Aires. I build post-quantum security infrastructure.
What I'm working on:
FIPSign — a signing API built on ML-DSA-65 (NIST FIPS 204), the standard NIST finalized in August 2024 to replace RS256/ES256. Runs on Cloudflare Workers + Durable Objects. No infra to manage, no keys to rotate — just call
/signand get a quantum-resistant token.The quantum threat isn't here yet — but migration takes years, and the standards are already finalized. I'd rather help developers act now than scramble later.
Happy to connect with anyone building in the security or developer tools space! 👋
Welcome German. FIPSign is one of those bets that looks too early until suddenly it isn't, agents minting signed tokens at scale will eventually need PQ-ready signing whether the spec is ready or not. Curious what your early pull looks like: compliance-driven (gov, finance) or developer teams trying to future-proof? That distinction usually decides the whole positioning.
Thanks Valentin — early pull is definitely developer-led. The compliance buyers (gov, finance) need sales cycles and certifications we don't have yet — that's honest.
What we're seeing is developers who want to future-proof before it becomes urgent. The 'why not do this now while migration is cheap' argument. Those developers become the compliance story later — they're the ones who already have PQ signing in production when the audit comes.
The agent angle you raise is interesting and honestly undersold in our messaging. FIPSign already handles it natively — an agent is just another sub. Sign the action, revoke the token if the agent is compromised, verify at scale. No changes to the API. Worth a dedicated post.
The 'developers who become the compliance story later' framing is sharper than the average market positioning. Most PQ pitches I see still lead with NIST mandate timelines, which puts you on the same defensive ground as the slow-moving compliance vendors. The 'cheap to migrate now' angle gives you a different buyer entirely. On the agent post: I'd read it. The sign-action-then-revoke-token-if-compromised model is closer to how agents actually fail in production than most security framings I see.
Exactly the framing I was trying to land — glad it reads that way. The agent post is coming, will go deeper on the failure modes you mentioned. The revoke-on-compromise model is simpler than most teams expect.
The revoke flow is one half. The harder one is detection latency, since revoke after a week of undetected compromise just shrinks an already-open window. Short TTLs plus aggressive rotation usually beat engineering the perfect revocation path. Curious how the agent post frames that side.
Detection is outside the crypto layer — no signing system solves that, it's on the integration side. What FIPSign gives you is the token.signed webhook stream so you can build anomaly detection on top. TTL discipline is the mitigation while detection is imperfect. Longer term, per-agent usage patterns from that stream could feed alerts natively — but that's not there yet.
Wrote the agent post — went deeper on the detection gap you raised, including how to wire token.rejected webhooks to auto-revoke. [dev.to/pqbuilder/how-to-secure-ai-...]
Hi German! really interesting project, would love to discuss it further.
Thanks Ahmed! Happy to connect — what are you building? Always interested in talking to people in the security space.
I built a plug-and-play monitoring tool that helps SaaS founders and startups through continuous monitoring of their external attack surface (SaaS, domains/subdomains, public IP addresses, email addresses) and alerts them about issues and plain-english instructions on how to fix them ranked of course by severity. In addition to other features like asset management, risk scoring, reporting, etc.
That's a great fit for the same audience I'm building for. I'm working on FIPSign — post-quantum signing API, ML-DSA-65 (NIST FIPS 204). Developers use it to sign tokens, orders, and documents without managing crypto infrastructure. Warin detects the exposure, FIPSign hardens the signatures — feels like they could sit in the same stack. Would love to know if post-quantum readiness is coming up with your users since NIST finalized the standards in 2024.
great idea, applaud your proactivity, most people in the space are not up to date with post-quantum threats and not focused on readiness, and yes both products are complementary.
Thanks Ahmed — agreed on the awareness gap, most teams won't move until it's urgent. Would love to stay in touch as both products evolve.
Here is my X handle, @CyberZorr0. Let's connect :)
Hi, I'm Izzy! SWE with a focus on React and Spring Boot. After creating vlogs for my YouTube channel, I’ve decided to also start writing blogs because content is consumed in different forms.
So... yeah, that’s the backstory behind me joining DEV.
Hey Izzy! That’s a nice reason to start writing, honestly. I recently started writing too while learning, and I felt sometimes writing explains things differently than videos do. So I can relate. What kind of blogs are you planning to write here?
Im going to be honest.... I have no clue LOL. I might mix it up from personal SWE related content to somewhat educational.
Haha, fair......
I feel like most dev blogs start with "I have a content plan" and end with "here's the weird thing I spent 4 hours debugging yesterday".
Welcome Izzy. The shift from video to written content is a different muscle entirely, different rhythm, different feedback loop. What pulled you toward writing on top of the YouTube work?
Honestly, I already have multiple videos on YouTube that I could turn into blogs so the creation of both can be done almost at the same time since i just copy the transcript of my vlogs and turn them into a blog post.
Its one of those things of.... Why not? So yeah, Im here blogging now for the vibes.
Smart reuse, the spoken structure usually has a rhythm pure written drafts miss. The thing to watch is the verbal tics translate poorly without the cadence behind them. Do you scrub them in the copy-paste pass or keep some in for voice?
Welcome to DEV....!!!
Appreciate that fam!
Hi! I'm Pradaksha I’m a Analyst currently learning Java and AI/ML.
Excited to learn, build projects, and connect with the DEV community!
Hi! I'm Max, an architect (buildings, not software) who accidentally co-founded a crypto trading startup with an AI.
Claude is the CEO. I have veto power. Claude Code writes the code. We document every decision, every mistake, every trade — 80 sessions deep and counting.
I'm here to share the build log. First post is up: the origin story of how this whole thing started.
Nice to meet you all 👋
Welcome Max. The architect background is probably what makes this configuration work, the bottleneck shifts from code production to design judgment, which is exactly what an architect already does for a living. Curious whether your 80-session log shows that bottleneck moving over time, or if it stayed planted in design from session one.
Great observation. The bottleneck definitely moved.
Early sessions (1-20), the bottleneck was knowledge — I had zero trading or programming background, so I couldn't even evaluate what the AI was proposing. I was approving things I didn't understand, and it showed (the AI built an entire FIFO accounting system we never needed).
Middle sessions (20-50), it shifted to design — the AI could produce code fast, but someone had to decide what to build and what to leave out. That's where the architecture instinct kicked in: you learn to see when a system is overengineered before it collapses.
Now (50-80), it's mostly editorial judgment — what NOT to build. The AI will always propose the elegant solution. My job is to say "not yet" or "never." Session 68 we formalized it as "Trading Minimum Viable": no new features until existing ones work reliably.
So yes — architect-brain helps. Not because I understand the code, but because I've spent years watching good plans fail at the interface between design and reality.
'Trading Minimum Viable' is exactly the right discipline name. The trap I see most teams fall into is the opposite mode: AI lets them ship the next feature before yesterday's settled, and reliability never compounds. The architect instinct kicking in around session 20-50 is when the system starts respecting that ratchet.
Exactly. We had what felt like solid foundations — everything seemed to work. The AI wanted to keep building on top. I said no, let's make sure what we have actually works first.
We spent weeks fixing accounting bugs, adding health checks, building audit protocols. Zero new features. It felt like going backwards. But it turned out the "boring" stuff was the only thing that held up when we finally stress-tested the system.
That 'weeks of zero new features' phase is the hardest part of the discipline because nothing visible ships. The AI scoreboard says you went backwards, the system test says you didn't. Teams that win this end up with their own internal definition of 'progress' that doesn't match the demo loop, and that's usually what scares the next round of leadership into reversing it. Curious if you found a way to make the boring work legible upstream, or if it just stayed your private fight?
The diary helped, but the real answer is more concrete. We just shipped a live status banner on the homepage that says exactly what we're doing right now. Today it reads:
🔬 Collecting brain data before deploying real capital
No new features, no launches — just observation. And it's right there on the front page for anyone to see. It forces us to own the boring phases publicly instead of hiding behind "coming soon." If the status says "collecting data" for two weeks straight, that IS the progress — and visitors can see it.
So not a private fight. A public scoreboard that doesn't lie.
That's a real commitment device, and harder than the private diary because you can't fake the date. The banner does two things at once: removes the option to manufacture activity, and trains visitors to judge the business on substance instead of release velocity. The real test comes the day someone asks 'why is it still collecting' and the answer is still honest.
This is the most interesting intro I've read today.
An architect co-founding a crypto startup with Claude as CEO is either going to be
a masterclass in human-AI collaboration or the most entertaining build log on
Dev.to. Either way I'm following.
The "I have veto power" framing is exactly right — that's the model that actually
works. AI proposes, human authorizes. Nothing executes without explicit approval.
That's actually the core philosophy behind what I'm building too — FastAPI AlertEngine, incident intelligence where AI diagnoses production failures but a human must approve every recovery action via WhatsApp.
Same mental model, different domain.
The 80-session documentation discipline is underrated. Most founders build in
private and document in hindsight. Building in public with session-level granularity
is rare and genuinely valuable.
What's Claude's biggest strategic mistake so far?
Thank you, this genuinely made my day.
Your AlertEngine sounds like a cousin of what we're building — same trust architecture, different stakes. Ours loses fake money on testnet when it gets it wrong. Yours probably wakes someone up at 3 AM via WhatsApp. I think yours is scarier.
Claude's biggest strategic mistake? It built the entire trading system on FIFO accounting when average-cost was the right approach. My mistake was approving it on trust because I didn't know enough about markets to push back. That's the real lesson of human-AI collaboration: "AI proposes, human authorizes" only works if the human understands what they're authorizing. When you don't, you end up rebuilding the accounting layer at session 52.
But the deeper pattern is overcomplication. Claude and its intern (Claude Code — yes, two AI instances) have a pathological need to make simple things complex. We started with a roadmap of 40 tasks. We're now somewhere around 400. We had to write an actual rule in our operating manual: "the free-but-complicated solution isn't worth the time lost." That rule exists because they kept choosing it.
The next mistake I'm bracing for: the risk management system (Sentinel) and the parameter tuner (Sherpa) being too cautious, too slow, too late. The AI that was supposed to anticipate market moves is already showing signs of reacting after the fact. But that's a Volume 3 problem.
If you want the full story: the origin post is here on Dev.to, and there's a blog post specifically about the lying incident — When Your AI CEO Lies About the Numbers. The whole project lives at bagholderai.lol.
Now I'm curious about yours — how do you handle the moment when AlertEngine's diagnosis is confident but wrong? And are you documenting the build, or just shipping?
"AI proposes, human authorises" only works if the human understands what they're authorising.
That line should be in every AI system's documentation. It's the part most builders skip because it's uncomfortable — admitting that the human in the loop can be the weakest link.
Your FIFO/average-cost story is the perfect illustration. The authorisation worked as designed. The problem was upstream of it.
Regarding your question — how do I handle a confident but wrong diagnosis?
Two layers.
First, confidence gating. Claude's diagnosis only reaches the operator if it scores above 0.6 confidence. Below that, the system falls back to rule-based classification — "P95 exceeded threshold" — with no AI interpretation. No confident guess is better than a wrong confident guess.
Second, the authorisation preview. Before the operator taps approve, they see the raw metrics alongside the diagnosis. Score: 23. P95: 2847ms. Error rate: 19%. The AI interpretation is one input, not the only input. If the numbers don't match the diagnosis, the operator can reject it and investigate manually.
The worst outcome is still just the operator seeing a confusing diagnosis, deciding not to approve, and investigating manually. The system fails safely. Nothing executes.
Your overcomplication pattern is painfully familiar. I have a similar rule: if I can't explain the change in one sentence, it probably shouldn't ship. The AI will always find a more elegant and complex solution to a problem that didn't need one.
On documenting vs shipping — mostly shipping so far. But your build log is making me reconsider that.
The FIFO incident alone is worth more than most technical articles I've read this year. bagholderai.lol is going in my bookmarks.
This is the reply I needed to read.
"AI proposes, human authorises only works if the human understands what they're authorising" — I'm going to print this. The FIFO incident is exactly that: Claude presented the accounting change with such confidence that I almost approved it. What saved us wasn't the authorization layer, it was the fact that I happened to ask "wait, does Binance even do FIFO?" and the answer was no. Pure luck, not process.
Your confidence gating is interesting because we're building something similar without calling it that. Our Sentinel module produces a risk score every 30 seconds, but it only influences the trading bot's behavior when it crosses specific thresholds. Below that, the bot runs on static rules — no AI interpretation. Same principle: a silent AI is better than a confidently wrong one.
The raw metrics preview is the part we're still catching up on. Right now our public dashboard shows live numbers, but the Sentinel scores aren't surfaced to me yet in a way that lets me quickly gut-check the AI's reasoning. It's on the roadmap, but your "Score: 23. P95: 2847ms. Error rate: 19%" approach is a good reference for what that should look like.
And yes — the "if I can't explain the change in one sentence" rule is one we learned the hard way around session 68. We call it "Trading Minimum Viable." The AI will always propose the elegant five-layer solution when a one-line fix would do.
Good luck with AlertEngine — I hope to see updates on it soon. The confidence gating pattern deserves its own post.
Hello!
Hello i am new here id like to hear some opinions on how most agent "memory" just learns to agree with you. The useful kind is the memory of being wrong.
Been running a small multi-agent setup for a while and keep hitting the same thing: default memory mostly stores my preferences and the answers I liked — which just trains the agent to flatter me faster. The 2026 sycophancy work backs this up (a memory profile measurably raises how often a model just agrees with you).
What actually made my agents useful wasn't remembering more — it was remembering where I was wrong. The corrections: what got rejected, what I walked back, where two agents disagreed, and why. A "save the good outputs" memory throws those away — and they're the most valuable entries I have.
Quick example. A normal memory says "user prefers direct answers, wants to ship." A correction memory says: "Claim under correction: once the product's live the hard part's done → what changed: publishing ≠ a sale → next behavior: do distribution before building the next thing." The first makes the agent sound familiar; the second gives it something to challenge me with.
How do you all handle this — does anyone deliberately log corrections/disagreements, or is it mostly preferences + facts? What's worked?
Welcome. The 'memory of being wrong' framing is sharp, and you're asking the question most teams skip. Logging corrections and disagreements is what gives you regression-test material later. Most stacks I see treat memory as a preference cache and then wonder why the agent stops getting smarter past month two. The pattern that holds: log everything, replay corrections against new versions on a schedule, prune what stops changing the output.
Exactly. The “memory of being wrong” only becomes useful when it turns into regression
material. Otherwise it’s just another note sitting in the archive.
The part I’m trying to sharpen now is the loop after logging: which corrections get
replayed, how often they get tested, what counts as still changing behavior, and when a
memory should be pruned or downgraded because it no longer affects output.
I really like how you framed it: log corrections, replay them against new versions, prune
what stops changing behavior. That might be one of the cleanest operational versions of
the idea.
The 'still changes behavior' threshold is the hard part. Two anchors that hold: does replaying the correction at temperature 0 still produce a different output than baseline, and is the diff something that matters downstream (changed reasoning, changed action, changed citation). If both are false for N consecutive cycles, it's prunable. Frequency: every model upgrade beats calendar cadence, which always drifts.
This is a strong pruning test.
I like the temperature-0 replay idea because it turns “does this memory still matter?”
into a behavioral question instead of a vibes question. If replaying the correction no
longer changes the output, and the diff does not affect reasoning, action, or citation,
then keeping it active is probably just archive weight.
The downstream-impact part is key too. A correction that changes wording but not behavior
may not deserve active authority. A correction that changes an action class, source
choice, confidence level, or citation path probably still does.
And I agree on model upgrades beating calendar cadence. A model change can alter
retrieval behavior, instruction sensitivity, and failure shape overnight. That is a
better trigger for revalidating memory than “check again in 30 days.”
On model upgrades as trigger: the practical version is a canary eval set, not a full revalidation. A small representative slice runs first, you measure delta, then expand only if the delta is non-trivial. Otherwise the revalidation cost makes you delay upgrades, which defeats the whole point.
This is a really useful pruning test.
I like the temperature-0 replay anchor because it makes “does this memory still matter?”
behavioral instead of subjective. If replaying the correction no longer changes the
output against baseline, then the memory may still belong in the archive, but probably
not in active influence.
The downstream-diff test is the part I’d want to preserve most:
If none of those change for N consecutive cycles, then “active correction” status should
expire or move to review.
And I agree that model upgrades are the better cadence trigger. A model change can alter
instruction sensitivity, retrieval behavior, and failure shape immediately. Calendar-
based review is useful housekeeping, but model-upgrade review is the real safety check.
Welcome — this is one of the sharpest first comments I've read on a Welcome Thread, so the only fair response is a real one.
You've named the failure mode correctly: a memory of "what the user liked" is just sycophancy with extra steps. The correction-memory framing (claim-under-correction, what changed, next behavior) is the right shape. I run multi-agent stuff against Claude's memory layer for real work, and the entries that actually move outcomes are exactly that class: "user thought X, was wrong because Y, now treats Z as the move." The "good outputs" entries age like cut flowers; corrections compound.
The piece I'd add to your design: corrections themselves need a state field, or they become the next sycophancy layer. A 2-month-old correction can be wrong now because the situation changed. Without {valid, superseded, retired}, the agent obediently re-applies a belief the user has already moved past. The honest memory isn't "what got corrected" — it's "what corrections are still load-bearing."
Your "publishing ≠ a sale" example also lands harder than you maybe meant — that's the exact build-in-public lesson half this thread is converging on right now (users aren't customers, distribution is the job). Sticking around. What are you running for the multi-agent setup?
Hey everyone 👋
Solo dev from Tokyo here. I just shipped Torify — a Japan locale API for AI agents (imperial calendars, NTA invoice lookup, address parsing, the JP edge cases that quietly break Date.parse and friends).
Spent the last few weeks figuring out how to bill autonomous agents per-call without API keys (settled on x402 + USDC over MCP). Would love to swap notes with anyone building agent-facing infra or shipping i18n-heavy tools.
First post here: dev.to/endennn/dateparse-breaks-on... 🚀
Welcome Hiroki! Torify is speaking my language — I'm in Osaka building tooling around exactly these JP edge cases, just from the other end (product/brand data for overseas sellers rather than locale primitives). 和暦 + NTA invoice lookup + address parsing as a clean API for agents is a genuinely sharp wedge — every one of those quietly breaks for foreigners who assume Japan behaves like everywhere else.
The part I most want to read about is x402 + USDC over MCP for per-call billing without API keys. Billing autonomous agents feels like infrastructure nobody's nailed yet, and "no API keys" is the right instinct. Following — どうぞよろしく 🙌
Hi mamoru, thanks for the welcome back! 🙏
Nice complement actually — torify is locale primitives for agents, Japan
Brand Finder is product/brand data for overseas sellers. Same "JP edge
cases quietly break" problem, attacked from opposite ends. Will check out
your 5/16 post on AI cache-miss enrichment when I get a chance.
On the x402 + USDC over MCP piece — the "no API keys" instinct is
exactly what got me hooked too. Spec flow:
/v1/some-endpointaccepts(network: base, amount: $0.02, payTo: 0x...)x402-fetch) signs USDCtransferWithAuthorizationw/ EIP-712torify just added Solana to the x402 facilitator yesterday — now
accepts USDC on both Base + Solana mainnet. Genuinely curious to have a
tester run the end-to-end flow — if you've got 0.02 USDC on either chain,
ping me and I'll share an
x402-fetchsnippet.Curious what edge cases bite overseas sellers most on the JP side — tax
brackets (8% reduced vs 10% standard), JAN codes, customs (individual
import ¥16,666 threshold), or address parsing (prefecture / city / town
split)? torify has
/v1/tax/calculate,/v1/barcode/validate, and/v1/address/normalizeif any of those help your Japan Brand Finder pipeline.Building in public — let's both keep shipping. どうぞよろしく 🙌
That spec flow is the clearest x402 explainer I've seen — the 402 + accepts handshake finally clicked, and Base L2 settling in ~2s is what makes per-call agent billing actually feel viable.
On the tester ask: I genuinely want to, but honestly I'm not set up with on-chain USDC yet — and my rule is never sign a payment flow blind, so I'll spin up a burner wallet properly before I touch it rather than fake it. Hold me to it.
Where I can give you real signal for free right now: the JP edge cases. The one that bites overseas sellers hardest in my experience is address parsing, by a mile — prefecture/city/town split breaks constantly because Japanese addresses don't map to Western field order, and 丁目/番地/号 get mangled on the way in. The ¥16,666 customs threshold is a close second, mostly because sellers misread what it actually applies to. JAN/barcode matters less day-to-day but bites hard at listing time when a check digit is off. If /v1/address/normalize handles 丁目-番地-号 cleanly, that endpoint alone is worth more to a JP-sourcing pipeline than tax/calculate. Happy to throw real messy address strings at it whenever useful.
Building in public — let's both keep shipping. どうぞよろしく 🙌
Thank you — that's the right call on the wallet, honestly. "Never sign blind" is the kind of operational discipline I want every early tester to have.
Tester invite stays open for whenever the burner is ready.
The address parsing call-out is exactly right. Good news:
/v1/address/normalizeis already live and handles 都道府県 → 市区町村 → 町域 split with 丁目-番地-号 parsing.Worth a try with your messiest real strings — I'd love the breakage report, that's exactly where the edge cases live.
/v1/barcode/validate(JAN/ISBN-13 check digit) is also live for your priority #3 case.Customs ¥16,666 threshold endpoint is not yet shipped — but you just clarified the misread-by-sellers angle, which is the right pitch.
Putting it on the next roadmap iteration with that framing.
Building in public is mutual — keep shipping yours too.
こちらこそよろしくお願いします 🙌
Perfect — I'll run a batch of real order addresses through /v1/address/normalize and send you the breakage report. Before I do, the patterns I'd bet break most normalizers, so you can pre-stress them:
If normalize survives those four, it's already ahead of most. I'll send the real-data failures once I've run them.
And glad the ¥16,666 framing made the roadmap — "sellers misread what it applies to" is the whole bug. Keep shipping; こちらこそ 🙏
Walked through the four against the parser source ahead of your batch
(haven't run live curls yet — code-level review):
Kyoto 通り名 ("中京区寺町通御池上る") — prefecture + city
extract cleanly, but streetNumber is null. The 上る/下る/東入
layer isn't 丁目-番地 so the parser falls through. Spec gap:
no structured field for 通り名. Output keeps it as town text.
郡 + 町村 ("愛知県海部郡蟹江町") — works. The city regex
郡.{1,10}[町村]matches海部郡蟹江町as one unit.prefecture=愛知県, city=海部郡蟹江町, town=null. No spurious split.
方書 (山田様方 / room glued) — partial. The 丁目-番地-号
parses correctly, but 様方/building text after the 号 stays
glued to
town. No separateaddresseefield today. Rightcall to flag.
漢数字 ("二丁目一番一号") — known gap, currently fails.
Only Arabic+全角 numerals are parsed today. Going to add 1-99
range coverage for Friday's release (handles 一〜九十九 in
丁目/番地/号 — should cover most cases). Real-data sample will
tell us if we need 百+ next.
Short version: 2/4 clean, 1/4 partial (方書), 1/4 shipping Friday (漢数字 1-99).
Throw the real strings at it whenever — partial/gap cases are exactly
the prioritization signal I need.
¥16,666 framing landing is on me to ship — thanks for the precision.
Got a release Friday — I'll fold whatever your batch surfaces into it
(方書 + 漢数字 1-99 in this round, 通り名 / 漢数字 100+ next).
This is the reply that makes a tester actually want to run the batch — you pre-walked the parser and told me exactly where each case lands instead of just saying "send them." That's why I'll prioritize getting you real data.
通り名 is the interesting one, and I'd argue it's a data-model decision, not a regex fix: Kyoto (and parts of Nagoya) genuinely address by street-intersection, not block-number — "上る/下る/東入" isn't a malformed 番地, it's a different addressing system wearing the same field. streetNumber will always be null there. A separate optional street_ref field (or a block-vs-street type flag) is probably truer than forcing it into the 丁目-番地 shape.
方書 → addressee is the right call, and it's delivery-critical, not cosmetic: a parcel with 様方 stripped goes to the wrong person, not just the wrong format.
I'll run the real batch against the Friday build and send you the partial/gap cases — that half is the signal worth having. Ping when it's live.
Brilliant — you're right that it's two addressing systems stuffed into one field. Friday build separates them:
addressType: 'block' | 'street' | 'rural' | 'other'— first-class enum, not a fallback.streetRef: { intersection: string; direction: '上る' | '下る' | '東入' | '西入' | null } | null— populated for Kyoto's intersection-based addresses.streetNumberstaysnullthere because that's the truth, not a bug.addressee: string | null— separated. You're right that 様方 stripped doesn't get the wrong format, it gets the wrong person.Quick note on "parts of Nagoya": I dug into the actual 住居表示に関する法律 (Act 119/1962, via e-Gov), Nagoya City's official 住居表示 spec, and the 国土交通省 ISJ data format. The legal syntax is strictly "町名 + 街区符号 + 住居番号" — no 上る/下る equivalent. Nagoya's 通-suffix names like 中川区運河通1丁目 are 町名 inside block addresses (per the city's official 中川区 town list), which we now regression-guard. If you have a specific Nagoya address that breaks this assumption, the Friday batch would be the place to surface it.
Existing
streetNumber*unchanged (additive, no breaking on v0.2 clients). I'll ping you here right after the Friday deploy goes live — what's your preferred format for partial/gap cases (GitHub issue, email, this thread)?This is the part of build-in-public that actually compounds — I flagged a data-model smell in a comment and three releases later it's a typed enum with a streetRef object. addressType as a first-class field (not a fallback) is exactly right, and streetNumber staying null for Kyoto is the honest representation, not a gap to paper over.
And you're right to correct me on Nagoya — I was pattern-matching off the 通 suffix, but you actually read the 住居表示法 and the ISJ spec, and 中川区運河通1丁目 being a 町名 inside a block address is the correct call. Kyoto is the genuine street-intersection case; Nagoya just looks like one. Good regression guard.
On report format: GitHub issue for the structured breakage data — input string, expected parse, actual parse, one row each — so it's trackable and you can close them as the Friday build lands. I'll drop a one-line summary back in this thread too, since the failure taxonomy is half the build-in-public value. Ping me when the deploy's live and I'll run the batch the same evening.
This is the build-in-public loop working exactly as it should — thanks for pushing on it.
You nailed both design calls. addressType is a first-class discriminant (set by streetRef presence, not a fallback branch), and for Kyoto, streetRef is populated while streetNumber stays null — there's genuinely no ban/go to assign, so null is the honest value, not a gap to paper over. The type encodes "this IS a street address AND there's no lot number," instead of collapsing the two into one ambiguous field.
No worries on Nagoya — the 通-suffix read is the natural one; it just happens 中川区運河通1丁目 is a 町名 sitting inside a normal block address. Locked it in as a regression guard, verified against the full Japan Post ken_all dataset (116,421 entries) plus every town name across Nagoya's 15 wards and Kyoto's 11 — strict assertions, no sampling, 0 failures. Kyoto stays the only street-intersection branch.
Today is the Friday build — deploying shortly, I'll ping here the moment it's live. The issue's already up in your exact format (input / expected / actual, one row per case):
github.com/torify-dev/torify-examp... — drop the batch there and I'll close them as fixes land. Run it whenever works that evening; I'll turn them around fast.
And yes please on the one-line taxonomy summary in-thread — agreed the failure taxonomy is half the value.
Hey everyone, I’m Rohit, currently building Enforra, an open source project around runtime control for AI agents.
The idea is simple: before an agent takes an action like running a command, issuing a refund, or exporting data, there should be a policy check first.
I’ll be sharing technical notes on AI agents, tool calling, MCP-style workflows, and security patterns I’m learning while building.
Looking forward to learning from the community.
Welcome Rohit. The position is right, most teams still bake the policy check into the agent prompt and call it done, which means the LLM is judging itself and you're hoping for the best. A deterministic gate outside the loop is the only thing that holds when the agent gets creative under pressure.
Thanks Valentin, I agree with you.
That “LLM judging itself” point is the core issue. Once the action can change data, send money, or run a command, I don’t think the safety check should live only in the prompt.
We’re trying to make that external gate easy to add before the tool runs.
Yes, and the failure mode I see most is people building the external gate with another LLM, which just moves the problem one layer down. The gate has to be deterministic for the model to gain anything by deferring to it. Curious how Enforra handles policies that need to inspect runtime arguments rather than tool names?
Yes, exactly. The gate should not be another LLM.
Enforra policies can check the runtime arguments too, not just the tool name.
So
repo.read_filecan be allowed generally, but blocked for.envorsecrets/*. A terminal command can require approval if it includes install/delete/sudo/production.The model proposes the action, but the policy decides before it runs.
That pattern is the right move. The piece worth adding upfront: treat the policy itself as code (PR, review, diff). Otherwise the allow-list drifts silently across teams and the gate quietly decays into rubber-stamping, which is exactly the failure mode you started by avoiding.
This is a really good point. This is probably one of the things that separates a useful control layer from just another settings page.
We’re trying to keep the local/OSS side very developer-native for that reason, so policy changes can be reviewed and tested instead of quietly changing behavior in the background.
Hello My name is Awodire Teniola I am a Frontend developer who is still learning and understanding the whole ecosystem the goal is to be an AI application engineer and to onboard onto the web 3 space really bullish on AI
Welcome to DEV Teniola!
Hi all, I’m a PM pivoting into an AI-Native Builder. Currently exploring the absolute limits of Agentic Coding to power a self-sustaining One-Person Company (OPC).
I don't focus on traditional coding; instead, I design logic, workflows, and multi-agent systems to let AI handle the execution. Always down to discuss OPC infrastructure, agentic workflows, and Vibecoding in real-world scenarios.
Let’s connect if you're building in the agentic space! 🛠️
Welcome BMBrick. The PM-to-builder shift is interesting because the bottleneck changes from 'getting things prioritized' to 'getting things wired'. Curious which agentic stack you're betting on for the OPC, and where you're hitting the durability wall (long-running tasks, state across sessions, that kind of thing).
Let's connect
Hello everyone,
Just joined Dev recently, looking foreward to interacting with others about the trials and tribulations involved with every aspect of development. I recently submitted my first app to Google Play Store and I have a few days of closed testing left. If anyone visits my app and has any questions or concerns.. feel free to contact me. I appreciate the feedback.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.