I've been building VibeCheck for the past few months - it's a security scanner specifically for AI-generated code. And after scanning over a hundred real codebases that people built with Cursor, Copilot, Claude, and various other AI tools, I have thoughts.
Not the "AI is dangerous" hot take. Something more specific than that.
The pattern that kept showing up
Almost every codebase had the same category of issue. Not SQL injection or XSS or anything that would show up in a classic OWASP checklist. The dominant problem was what I started calling trust misconfigurations - places where the code just... assumed everything was fine.
Open CORS policies. Service accounts with admin permissions because that was the fastest path to getting it working. API keys hardcoded in config files that weren't in .gitignore. Input that got passed straight into shell commands with no sanitization.
None of it was malicious. The AI wasn't trying to introduce vulnerabilities. It was just optimizing for "make it work" and had zero weight on "make it survivable in production."
The thing that surprised me most
I expected the biggest problems in the actual logic - like the AI misunderstanding authentication flows or getting crypto wrong. That exists too, but it's not the main thing.
The main thing is environmental. All these tiny decisions about permissions and access and trust that a senior dev would make automatically, almost subconsciously, because they've been burned before - the AI just doesn't make those decisions. It picks the path of least resistance every time.
One project had a DB connection string with full admin creds, no connection pooling limits, and a query that accepted raw user input. Technically functional. Completely fine for local dev. The kind of thing that gets quietly exploited six months after launch.
What actually helps
Scanning after the fact (what we do with VibeCheck) catches the obvious stuff. But the real fix is earlier in the loop.
The projects that had the least issues were the ones where the developer was actually paying attention during generation - not just accepting output wholesale but reading it, asking "wait, why does this need admin access?" That friction. Even a little bit of it makes a big difference.
Some people are building this into their prompts - explicitly telling the AI to follow least-privilege principles, to validate all inputs, to not hardcode credentials. Works okay. Feels like workarounds.
The better solution is probably tooling that runs in the background during vibe coding sessions and flags stuff in real time. Not a code review gate. Just... something watching.
The uncomfortable part
A lot of these codebases were shipped. Some had real users. A few were running in production environments with actual credentials and real data.
The developers weren't careless people. Most of them were genuinely excited about what they'd built - and most of what they built was genuinely cool. The security stuff just wasn't on their radar because it never came up during development. Nothing broke. Tests passed. It worked on their machine.
I keep thinking about that gap. Between "works fine in dev" and "safe to run with real users." AI coding tools are really good at closing the first gap - getting something functional fast. Nobody's really solved the second one yet.
That's the problem I'm trying to figure out. Not sure I have it yet. But the 100 codebases were pretty clarifying.
If you're using AI to build things and want to know what the scanner finds in your repo, VibeCheck is live. Free tier, no credit card. Takes about 2 minutes.
Top comments (89)
Can't this be mitigated by adding more context into model?
it helps but only partially. i tried this - added security guidelines, told it to never hardcode secrets, use parameterized queries, etc. the model follows it when the task is clearly framed that way. but when it is doing something else (like "add this feature") it just... forgets. the security context doesnt carry over into every decision. what i found works better is having a separate security review pass after generation - treat it like a code review step, not something you bake into the initial prompt
well, one of students is doing security knowledge graphs and compressing at md level as a start, it's not the end, but it will bring fidelity to the front (problems/issues), try it out, I'd be happy to do it and give it back to you
that is actually a clever angle - compressing security knowledge into structured context rather than loose prose instructions. would be curious to see how well it holds up across different types of vulns. the fidelity problem is real, vague rules get interpreted loosely. share it when done
yeah, give me an input, or I can go with your thread, choose the battlefield and weaponry
haha ok - battlefield: a prod codebase that was 100% vibe coded in one weekend. weaponry: a security scanner and a code review. my bet is on the scanner finding something before you finish reading the first file
You can have a highly compressed structured knowledge graph that knows structures of relationships that’s way more reasoning hops.
fair point on reasoning hops - that is a real advantage for knowledge-dense domains. different problem space than what i was scanning for though
What language are you speaking? What’s your problem?
First all, this is not a problem I give a crap about philosophically, so I don’t work on that. You probably have these nuances that could be really graphed out. Are you graph aware? Are you context aware? How much reasoning hops or token density have you figured? You are married to the narrow mindedness that a relational database is any good, or legacy dogfood for gluttonous corporations?
haha no problem at all - this is just a good debate. i build things, you challenge assumptions, that is how it should work
Perfect! Slava Ukraine
No I build things. I have 3 million lines of code, 4 million tokens on one Mac mini on one max plan per month. I compress graph knowledge. I have 50 problems I’m solving. I’m not building?
Slava Ukraini 🇺🇦
But as a man of good sport and international goodwill, I accept the challenge to learn together. Mario Andretti said, if you seem like you got things under control, you’re not going fast enough. Speed kills. Probabilistically speaking, that is, in my little vibe coding mind.
graph awareness helps but most vibe coders are not reaching for neo4j, they are pasting code into chatgpt and clicking deploy. the relational db critique is fair for complex domains but the security issues i found were not about data modeling - hardcoded secrets, no input sanitization, open cors. those are not architecture problems
Create Claude skills to knowledge graph scanner, reduce to the MD level second knowledge graph the hardcoded secrets reduce those anomalies to an md level. Keep going. More edges and nodes and directions of our relationships in textured way. I have calculated 170x more reasoning hops than flat files. There’s math in interconnected relationships, which sounds spiritual. 🙏
Open claw? I never left my terminal as I was building things. If I solve problems, I don’t care what anybody says, right? It’s who cares who will win this game = context. Power to the people
170x is a bold claim - curious how you measured it. the interconnected relationships angle does have something to it though
Graphifymd.com
Or better e why don’t I give you a compressed md file of a graph database of your request on here later this evening? Then you go work with it and get more context out of it, like I want to
will check it out
Or better why don’t I give you a compressed md file of a graph database of your request on here later this evening? Then you go work with it and get more context out of it, like I want to
yeah drop it when ready, genuinely curious what that looks like
Ok I’m driving. I need a few hours. I’m running errands
take your time, no hurry
But I need more context. Describe those signals you sense, we will get more fidelity. Describe your friction right now with this process, tell me what you wrestle with = context. More is better. Conversations recorded or voice notes are better than dumb prompts. Graph building skills are better than your representation, use your voice
main friction was that vibe-coded apps failed in ways that werent obvious until you looked - surface looked fine, internals were not. thats the signal worth capturing
Precious/Oliver, who are you asking this?
not sure I follow - looks like this comment thread got a bit tangled. feel free to reach out directly if there is something specific you wanted to discuss
I think this problem would most commonly surface on apps that were built on a single go/prompt. When the whole codebase is generated as the output, it can be hard to wrap your head around what to look for and validate. Any "shortcuts" taken by AI at that point is easy to miss. Combined with the "don't fix it if it ain't broken" mentality and excitement to ship new apps, it's easy to see how people end up shipping something real but insecure.
Yeah, totally agree. The single-prompt codebases were the worst offenders in what I scanned - you could almost tell by the file structure alone. Everything flat, no real separation, and the AI just... didn't know what it didn't know. Auth mixed with business logic, env vars hardcoded because nothing was telling it otherwise. The "don't know what to validate" part is real too - when you've never had to think about the security surface of a piece of code, you have no mental model for what could go wrong. At least with iterative builds there's usually some moment where a human has to integrate things and maybe notices something feels off.
The framework here is backwards. Let me explain.
Boris Cherny created Claude Code and shipped 300+ PRs in December 2025 running 5 agents simultaneously. His setup is intentionally vanilla — minimal customization, trust the model, let it rip. That works when you're Boris and you've internalized 20 years of engineering patterns. The model fills in what he already knows.
But that's exactly the problem this post exposes. Most developers don't have Boris's mental model. So the AI optimizes for "make it work" with zero weight on security, architecture, or trust boundaries. The model doesn't know what it doesn't know.
The fix isn't scanning after the fact. It's structuring what the model knows before it writes a single line.
I've been building the opposite of Boris's vanilla approach. Instead of trusting the model to figure it out, I compress domain knowledge — security constraints, architectural rules, dependency relationships, trust boundaries — into structured .md files as knowledge graphs. Entities connected by typed relationships. Not prose instructions that get buried. Traversable constraints the model follows like a checklist it can't skip.
Two personas using the same tool:
Boris (creator of Claude Code):
Dan (knowledge graph builder):
Boris can afford vanilla because he IS the knowledge graph. Most developers aren't Boris. They need the structure externalized.
It's the same pattern everywhere:
Chef vs. recipe follower. A Michelin chef doesn't need a recipe — decades of training IS the context. A home cook needs the recipe or they burn the sauce. The recipe is the knowledge graph. AI is the kitchen.
Surgeon vs. med student. A senior surgeon operates from pattern recognition built over thousands of procedures. A resident needs the checklist. Atul Gawande wrote a whole book about this — structured checklists reduced surgical deaths by 47%. Not because surgeons are bad. Because externalized structure catches what even experts miss under pressure.
Senior dev vs. vibe coder. A senior dev reads AI output and instinctively flags "wait, why does this have admin access?" A vibe coder ships it because nothing broke in dev. That instinct IS a knowledge graph — it's just trapped in one person's head.
The 100 codebases in this post are the home cooks, the residents, the vibe coders. They're not careless. They just don't have the graph externalized yet.
When those constraints are typed relationships in the model's context, it doesn't "forget" them. It can't take the path of least resistance because the graph constrains the path.
Same Claude. Same model. Different structure going in. Different security posture coming out.
The 100 codebases in this post didn't fail because AI is bad at security. They failed because nobody structured what "secure" means for that specific domain before the model started coding.
The numbers from my builds: 3M+ lines of code in 50 days. Solo. On Claude Code. 170x token density when compressing domain knowledge to .md. 93% token compression. ~3,000 tokens instead of ~500,000 for the same reasoning quality. 99.4% lower carbon per query.
That's what I build at Graphify.md — domain knowledge graphs compressed to portable .md. Works in Claude, ChatGPT, Cursor, anywhere. Security is just one domain.
graphifymd.com
yeah this is basically the crux of it. the model inherits your taste - if you have good taste, output is good. if you don't, the model confidently produces something that looks right but isn't. the security holes we found were exactly that pattern, generated code that looked professional but was missing the instincts a senior dev would have applied automatically
love on the model Mykola, context is love
haha context is love until it forgets everything past 200k tokens and you are back to square one
just like those pretty ladies in Kiev, back to square one after all those tokens gone, my life story.
haha too real - context window as a metaphor for life, hits different at 2am
So are you pleased with result? I’m happy if you’re happy.
The excitement to ship is the part that gets me. I have shipped stuff I knew wasn't fully reviewed because the demo was working and the window felt short. It's not even ignorance at that point, it's a conscious tradeoff you make and then just hope nothing happens. The difference with vibe-coded apps is the developer might not even know the tradeoff exists. They think the app is fine because nothing broke during testing. At least when I cut corners I know where I cut them.
the invisible tradeoff is exactly it. "nothing broke yet" and "nothing is broken" feel the same from the inside. experienced devs at least know they made a tradeoff. vibe coders might not even know the category of risk exists
That makes a lot of sense.
What you’re describing reminds me of how tools like Google’s Stitch are approaching UI generation — instead of generating arbitrary code, they start from structured HTML/design foundations and then build on top of that.
It feels like the same idea in a different domain:
not letting the system generate freely, but constraining it within a known structure from the beginning.
In your case, it’s about trust and permissions.
In UI systems, it’s about tokens, semantics, and layout constraints.
Both are trying to solve the same underlying problem:
reduce the space of “unsafe” decisions before they ever make it into production.
I’m starting to think the future isn’t better generation — it’s better constraint systems around generation.
Curious if you see this evolving toward more “structured-first” AI tools rather than free-form ones.
that Stitch parallel is actually really sharp. constrained generation - whether it's design tokens or permission templates - forces the model to work within a safe envelope instead of hallucinating its way into free-form territory. I wonder if the same idea applies to infra: give the AI a library of known-good IAM snippets to compose from rather than letting it generate policies from scratch. less creative, but the blast radius when it's wrong is way smaller
Why isn’t it the industry didn’t pay attention to graph databases more? It was always the better, but very few very skilled professionals map entire ontologies. Palantir does, Bloomberg does, big money. Now if your knowledge graph structure is compressed time and time again, reduced to 10-15kb, you’re trying more context through the context window and hence more fidelity. I can get the Jensen Huang engineer 250k to 5k? Context efficiency, reasoning hops, context density. I’d like to graph and give the files for you to play with in any environment. That’s my proposal. However, I will not use search engines personally, think what that means.
graph databases had the tooling problem for a long time - the queryability and dev experience lagged behind relational. LLMs are actually changing that equation because natural language queries map better to graph traversal. compressed ontologies as context is a real pattern, especially for domain-heavy apps where the graph IS the knowledge
That’s a really sharp extension of the idea. Constrained generation feels like the common pattern here — whether it’s design tokens, UI components, or permission templates, the goal is to keep the model operating inside a safe envelope instead of letting it hallucinate freely.
Applying this to infra makes a lot of sense. Composing IAM policies from a library of validated snippets is far more predictable than generating them from scratch.
You lose some flexibility, but the reduction in blast radius when things go wrong is absolutely worth it in production.
composing from validated snippets is the right direction. basically policy-as-code with a model-friendly interface. you get the expressiveness of natural language generation but bounded by proven primitives - audit trail becomes way cleaner too
Well, what if I say I knowledge graph any question I have through Claude Code CLI, and my Google is the Claude on the browser. What am I?
someone who built a workflow that actually fits how they think. using Claude as both the graph and the search layer is interesting - you lose the precision of structured queries but gain the flexibility. depends what you are optimizing for
That’s a really interesting way to frame it — especially the idea of turning Claude into a knowledge graph interface rather than a generator.
I think we’re seeing the same shift in UI systems. The problem isn’t that models aren’t good enough at generating — it’s that we’re letting them operate in an unconstrained space.
What we’re experimenting with is pushing structure before generation — forcing everything through component specs, tokens, and layout constraints. The model isn’t generating UI anymore, it’s composing within a system.
So yeah, I’m starting to believe the future isn’t better generation — it’s better boundaries around generation.
you are someone who found a workflow that actually fits how you think. the tool matters less than whether it extends your cognition or interrupts it
The trust misconfiguration pattern is painfully familiar — and not just from AI-generated code. We run 15 microservices at a fintech startup, and before we centralised our env management, we had the exact same issues with human-written code: DB connection strings with full admin creds, no connection pooling limits, env files copy-pasted over Slack.
The fix was a shared Zod schema that validates every env var on startup. Dev-local mode warns, production mode calls process.exit(1). It catches the "fastest path to working" problem before it ships. AI just makes that anti-pattern happen faster. The real gap is that nobody validates the boring infrastructure decisions — human or AI.
the human-written fintech example is actually the better data point tbh - same patterns, just slower velocity. AI just makes it impossible to ignore anymore because it surfaces 3x faster. centralising env management was the right call regardless of how you got there
Exactly right. The AI just made it impossible to ignore what was always there. We'd been accumulating env variable drift for years — every new service copied from the last one with slight modifications nobody documented. An AI agent flagged 23 inconsistencies in minutes. A human would've found the same things eventually, just spread over months of "hmm, that's weird" moments. The centralised env management was the real fix. The AI was just the mirror that made us stop pretending the mess wasn't there.
23 inconsistencies in minutes is a good example of the pattern - not finding new problems, just surfacing old ones faster. The "hmm that is weird" moments that used to disappear into the backlog now come back with receipts. Centralized env management makes sense as the fix but the harder part is usually the cultural shift: actually enforcing it after you find the problems.
We added drift checker in pr and pre commit so instead of culturally aligning we did it as automated process. Which prevents this from ever happening again.
That is the right call - baking it into the pipeline so it is not a cultural ask anymore. The cultural alignment approach works until someone is rushing a deploy at 11pm. Automated constraint is the only version that holds under pressure.
100% agree on the automated constraint point. We learned this the hard way — we had a "best practices" doc for env management that everyone agreed with and nobody followed consistently. The moment we replaced it with a Zod schema that validates on startup and crashes the process in production if something's wrong, compliance went from "mostly" to "always." The doc was aspirational. The schema is a fact. Cultural alignment is important for things that genuinely need judgment. For things that have a clear right answer — like "don't deploy with an empty DB connection string" — just make the wrong thing impossible.
the zod schema approach is the right call - trust the invariant in code, not the dev's memory of a doc. we hit the same thing with config validation for AI agents: once we moved from 'the readme says X' to 'startup crashes if X is wrong', compliance stopped being a conversation. the cultural buy-in you already had makes the technical enforcement feel collaborative rather than punitive too, which is a nice side effect.
Great writeup. Security in vibe coded projects is the elephant in the room that nobody wants to talk about. I run a small hosting platform and we added automated YARA scanning on every single deploy exactly because of this. You would not believe the stuff people upload, hardcoded API keys, SQL injection galore, sometimes even test credentials for production databases. The AI generates code that works but it has zero concept of security. I think every hosting provider that targets vibe coders needs to have some form of automated scanning built in, its not optional anymore
honestly the YARA scanning is smart - we ended up doing something similar for VibeCheck, scanning for patterns that scream "AI wrote this and forgot security exists". the hardcoded credentials thing you mentioned is everywhere, I found them in like 40% of the repos I scanned, sometimes buried 3 folders deep like the dev thought obscurity = security. tbh I think you're right that hosting platforms need to build this in because most vibe coders won't think about it until something breaks. the AI just doesn't model threat scenarios, it models "make the tests pass"
YARA at deploy is smart but what happens when it catches something? Do you block the deploy entirely or just flag it? Asking because the article mentions that most of these issues aren't malicious, they're trust misconfigurations. Blocking a deploy for a hardcoded key makes sense. Blocking it for open CORS might be too aggressive if the dev intended it for a public API. Where do you draw the line on what's a hard block vs a warning?
the block vs flag question is real and context-dependent. hardcoded secrets = block, no debate. open CORS or overpermissioned IAM - I would flag with severity + required acknowledgment rather than hard block. the key is forcing a conscious decision rather than hoping the dev noticed. "I know this is open CORS and I intended it" is a very different state than it slipping through unreviewed
Who are you worried about? Like you’re worried for what group of people? The security company that watches what?
mostly worried about the builders themselves - a lot of vibe coders ship without ever looking at what permissions their apps are granting or what data they are exposing. no malicious intent, just not in the habit of thinking about attack surface
As someone who builds iOS apps entirely with AI-assisted development, the trust misconfiguration pattern is painfully real. I've caught myself shipping overly permissive Supabase RLS policies because the AI-generated code "just worked" in dev.
The gap between "works in dev" and "safe for real users" is exactly what keeps me up at night. My approach has been adding a manual security review checkpoint before every App Store submission — but that doesn't scale well when you're shipping multiple apps.
The idea of background tooling that catches these issues during the coding session (not after) feels like the right direction. Would love to try VibeCheck on my projects.
yeah the Supabase RLS thing hit me too. had a policy that was basically
truefor reads because the AI wanted to make the demo work fast. caught it only when I read through the migration files line by line before launch. now I ask the AI to specifically audit the policies after writing them - weirdly works better than asking it to write secure ones in the first place. still manual but at least it's a separate pass with a different promptWe can knowledge graph these solutions. I cannot say what will or will not work, but I can help get more efficient context architecture to assist in your solution development. We all can assist each other in solving big problems together.
collective intelligence angle is interesting. the challenge is keeping context sharp - more nodes can help or hurt depending on how well the graph is structured
Honestly, I’d rather have a recorded conversation, a real conversation, and load that text into model. Not tax my prefrontal cortex to write a prompt. I’m too dumb
conversation as context is underrated. transcripts carry nuance that prompts lose. the "too dumb to prompt" framing is honest - most people think in conversation not structured instructions
You can just give me a call
haha lets keep it in the comments for now - more interesting for others to read too
I’m going to give this to you in a few hours
no rush
Mykola — here it is. I owe you a proper response, not scattered comments from behind the wheel.
the thread earlier was raw signal — fragments of an idea typed between errands. this is what happens when I actually sit down and structure it. a page of structured context enables 170 reasoning paths. a few comments in a thread enable maybe 3. you deserve the full page.
you said: "battlefield: a prod codebase that was 100% vibe coded in one weekend. weaponry: a security scanner and a code review. my bet is on the scanner finding something before you finish reading the first file."
I accept. but here's the twist.
I'm going to vibe code a real production-ready app this weekend. 100% AI-generated, one weekend, shipped. but before I write a single line, I'm loading a security knowledge graph built from YOUR findings into the model's context. your article, your comment thread, your data — structured as typed relationships with rules, constraints, and the vulnerability patterns you found across 100 codebases.
your own data, aimed back at your scanner.
then you scan it with VibeCheck. if your scanner finds vulns — you win, the scanner beats the graph. if it comes back clean — the graph wins, structured context prevented what scanning would have caught after the fact.
either way, we both learn something real. and the experiment is documented.
I took your article + this thread + your twitter findings and compressed them into a knowledge graph. 58 entities, 89 typed relationships, 8 layers, ~320 reasoning paths. paste it into any model and ask it anything about your security domain.
but it's not just a diagnostic. layers 6-8 are solutions — things you can actually build, including the "something watching in the background" you said nobody's solved yet (spoiler: it's solved — Claude Code hooks do exactly this).
there are also 5 weak signals I found in your own words that you might not have noticed the significance of. your data told a bigger story than the article captured. the graph connected them.
I already ran the A/B test before posting. same model (ChatGPT, extended thinking), same prompt: "what's my minimum security checklist for shipping a vibe-coded app this weekend?"
flat article: generic 11-item OWASP checklist. zero references to your actual findings. zero named vulnerability patterns from your data. the model ignored your article and fell back to training data.
knowledge graph: 3-tier ship/no-ship decision framework. 16 identifiable graph traversals. 6 named patterns from your scans (open CORS, overprivileged accounts, hardcoded creds, unsanitized passthrough, f-string SQL, secrets in source). 4 direct citations of your data. referenced your failure modes by name.
same model. same prompt. 16:0 on domain-specific insights. the flat version didn't score lower — it didn't score.
the full math is in Layer 10 of the knowledge graph. run the same test yourself — paste your flat article into one chat, paste the KG into another, same question, compare. that's the proof.
10 layers. 78 entities. 124 typed relationships. ~450 reasoning paths. from one page of structured .md — the same token budget as your flat article, but 170x more ways for a model to reason through it. that's the claim. the A/B test is the proof. the knowledge graph is in my next comment.
and when you're ready to define the battlefield — framework, scope, what "production-ready" means to you — tell me. I'll build it live, KG-loaded, and you scan the result. loser buys the other one a coffee via the internet.
let's go. 🇺🇦
We both win this way I would argue to say...but I need more context when you're concerned with context architecture.
Here's the knowledge graph. Copy everything below and paste into any AI model.
Vibe-Coded Security Vulnerability Knowledge Graph
Source: "I Scanned 100 AI Codebases" — Mykola Kondratiuk (VibeCheck)
58 entities | 89 typed relationships | 8 layers | ~320 reasoning paths
GRAPH SCHEMA
LAYER 1: ROOT CAUSE MODEL
LAYER 2: VULNERABILITY TAXONOMY
LAYER 3: FAILURE MODE ANALYSIS
LAYER 4: INTERVENTION MODEL
LAYER 5: DOMAIN TRANSFER PATTERNS
LAYER 6: SOLUTION ARCHITECTURE — What You Can Build
STAGE 1: Pre-Generation Context Injection
STAGE 2: During-Generation Real-Time Monitoring
STAGE 3: Post-Generation (Enhanced Current VibeCheck)
LAYER 7: PRODUCT EVOLUTION — VibeCheck Becomes a Platform
LAYER 8: IMPLEMENTATION QUICKSTART — Build This Weekend
QUERY INTERFACE
to use this graph, paste it into any AI model and traverse relationships:
SQL_INJECTION --CAUSED_BY--> STRING_CONCATENATIONDEVELOPER_ATTENTION --PREVENTS--> TRUST_MISCONFIGURATIONSAI_CODE_GENERATION --IGNORES--> TRUST_BOUNDARIESDEV_ENVIRONMENT --MASKS--> PRODUCTION_VULNERABILITIESREAL_TIME_MONITORING → see LAYER 6 STAGE 2 (it's solved)WEEKEND_BUILD_1 → Security Rules Generator (4 hours, highest ROI)V1 → V2 → V3 → V4 (scanner → context gen → copilot → platform)COMMUNITY_SECURITY_KG --ENABLES--> FRAMEWORK_SPECIFIC_RULESYOUR NUMBERS
LAYER 9: THE CHALLENGE — Scanner vs. Knowledge Graph
Experiment Protocol
What the Experiment Proves (Either Way)
Weak Signals That Inform the Challenge
GRAPH TOTALS (including challenge layer)
LAYER 10: THE MATH — A/B Test Results
I ran this before posting. same model (ChatGPT, extended thinking). same prompt. two different inputs.
Test A: flat article pasted in (no graph)
Test B: this knowledge graph pasted in
the numbers
the 170x claim — what it means
GRAPH TOTALS (final)
78 entities. 124 relationships. 10 layers. ~450 reasoning paths. grew from 31 entities in v1 to 78 in final — each refinement compounded, never restarted.
the A/B test proved 16:0 on one question. the graph has 170 paths per page. test it yourself — paste this into any model, ask it anything about vibe-coded security, and compare against the flat article. the graph does the reasoning. you ask the questions.
built with graphify.md — domain knowledge → portable .md
okay this is a proper setup - shipping it, then I scan it. I am in. build it this weekend and drop the link, VibeCheck will run a full scan and we post the results. tbh I am curious whether context-loaded generation actually holds up against the patterns I found or just looks cleaner on the surface.
This is super interesting to validate statistically. Definitely, having security principles in prompts helps, so does having a semblance of an idea of the architecture of your application. Now get that that’s exactly what some people can’t define because they aren’t experts. But let’s say you’re writing a piece of python tooling: tell it which version you want, that you want pyproject.toml instead of requirements.txt, some file structure you’d typically want, that you want to use pydantic for settings. And more importantly, what you don’t want it to do. This in my experience increases the chance of the output being easier to validate and check because it aligns with my mental model of how this tool should probably work. Knowing where to look becomes easier.
yeah the "you have to know what you want" problem is real. i ran into this exact wall - if you can articulate the architecture you already kind of know what you are doing. the devs who get burned the hardest are the ones who dont have that mental model yet, so they just accept whatever the model outputs without questioning it. the pyproject.toml type of constraint helps a lot though, having a checklist of non-negotiables you paste in every time basically forces a floor on quality
I found Dev.to last night, I don’t even know why I’m here, it’s like to be rejected by people that certainly don’t help me, but again, I’m trying my best to show what I think, in my opinion, has merit.
hey, for what its worth - your takes have merit. the graph / context angle is genuinely interesting, it pushed me to think differently. dev.to is worth sticking around
Or better e why don’t I give you a compressed md file of a graph database of your request on here later this evening? Then you go work with it and get more context out of it, like I want to
sounds good
hey, probably worth deleting that - public comments are indexed. keep it here
Some comments may only be visible to logged-in visitors. Sign in to view all comments.