I built a bot trained on my own X bookmarks and likes. Around 50,000 of them, accumulated over years of lurking, arguing, and clicking the save but...
For further actions, you may consider blocking this person and/or reporting abuse
what surprised me building something similar - once you add the style layer it stops feeling like retrieval and starts feeling like you actually said it. RAG without that is basically a glorified search.
The style layer point is the one most people skip when they write about RAG. Without it you have a search engine with extra steps. With it you have something that feels authored and that gap is where the interesting questions live. The retrieval is doing the work but the style layer is what makes it feel like thought rather than lookup. Which is exactly what unsettled me about it.
honestly, style is the finish work but retrieval is the foundation — i've seen polished prose wrapped around the wrong chunks and it's worse than a bad UI because it feels authoritative. if the chunking is off, no style layer rescues it. style multiplies retrieval quality, it doesn't replace it.
"Feels authoritative" is the exact failure mode. Bad UI signals itself. Broken layout, missing button, you know something is wrong. Bad chunking with good style signals nothing. The output reads clean and confident and is wrong about what it retrieved. That's a harder bug to catch because the surface looks fine.
Which means chunking strategy is actually a trust problem, not just a performance problem. The style layer is downstream of it entirely . you're right that it multiplies retrieval quality rather than replacing it. The implication is that most RAG evaluations are measuring the wrong layer...
yeah, from the PM side there's no ticket for this. model sounds right, PM ships it. the gap surfaces weeks later when the agent contradicts itself across artifacts and someone finally traces it back to the retrieval layer.
Such a bookmark brain was an idea that kept fascinating since computers got more powerful and the internet connected us to so many new sources and ideas. Fuzzy full-text search seemed to be an obvious answer, but what about synoyms? Tagging, hyperlinking, and bookmarking showed that we still needed curation and communication. And what about information getting outdated? And, now, what about AI hallucination?
Your project post points out two important things: AI, or any other processing, doesn't make bad data good. And many of "our own thoughts" are just shaped by our past experience and input. You conclude that the dataset is the author. I think that's a generalization neglecting actual analysis, creativity and findings. But in the context of AI and thinking, you're right.
What's also missing in many current discussions: facts still matter, creativity still matters, and inifinite monkey authors still won't write like Shakespeare. And AI won't, either.
The pushback on "dataset is the author" is fair and I'll take it partially. The essay was making a narrower claim .That for this specific system, the quality of output is downstream of curation quality not model quality. The generalization breaks down when you introduce analysis and creativity that genuinely recombines rather than retrieves. The honest version is: the dataset is the dominant author and the model is a compositor with limited creative range.
The infinite monkey point cuts deeper than it looks though. It's not just that volume doesn't produce quality . it's that style is irreducible to pattern frequency. Shakespeare's compression, the specific weight of a line, isn't statistically recoverable from enough similar text. That's the ceiling. it's not a scale problem....
I literally created an account just to say this...
John Stuart Mill argued that we cannot logically confirm other people have minds, but can only assume they do based on an argument from analogy. Mill reasoned that because other humans have bodies like our own and exhibit similar outward signs and behaviors, it is reasonable to infer they possess similar inner lives and consciousness, even though this remains an unproven assumption rather than certain knowledge.
Actual Intelligence entails not only learning and retrieval, but deduction and generalization as well; and, most important of all, does not equate frequency of repetition with truth. A single observation, the single instance of an unexpected occurrence, accepted as an undeniable fact, is sufficient to call into question a lifetime of assumptions.
As a thought experiment, consider the following: if an LLM were trained on 8th century knowledge, how intelligent would we consider it?
The Mill framing cuts right to it. We extend mind-attribution by analogy — similar body, similar behavior, therefore similar inner life. LLMs pass the behavioral test well enough that the analogy fires automatically, even when the substrate is completely different.
The 8th century thought experiment is the sharper point though. The output would look primitive, and we'd call it unintelligent but the architecture would be identical. Which means what we're actually measuring when we say "intelligent" is the quality of the training distribution, not the reasoning process. That's not a small concession.
The part I'd push back on slightly: deduction and generalization. LLMs do something that looks like both, often convincingly. The failure mode isn't absence of those behaviors .it's that they're not grounded. A single contradicting observation should update the model. It doesn't. The weights are frozen. That's the real gap.
You're right to push back on deduction and generalization (induction).
My mental model of human cognition and intelligence isn't formalized just yet, and those, I'll readily admit, were recently incorporated temporary placeholder concepts - better than what I had, but not quite good enough. The fact of the matter is that it's difficult to put my finger on it and put it into words. I glimpse but can't grasp. Discussion helps.
Inference, induction and deduction are integral to reason and logic; and reason and logic are inextricable from memory. Memory, as it so happens, is precisely what we've built and termed AI - a memory retrieval system. Induction is how it learns, from specific instances it derives general principles based on pattern recognition, which it can then apply to specific instances, such as forming a coherent sentence, and that's a form of deduction.
But its deduction is contextual, relying on those same patterns. Which is why, if asked: "Should I walk or drive to the carwash that's two blocks always?", it answers: "Walk". Because it's relying on language patterns for 'reasoning', not actually reasoning about what's been said. There is no conception of reality, as such, and therefore nothing to ground the words.
Deep learning systems that simulate outcomes based on known principles, that's a whole another story. That's much closer to human cognition. It's narrow intelligence, admittedly, but a step in the right direction. A necessary one.
The carwash example is the cleanest version of this I've seen. The answer is linguistically correct and situationally wrong and the model has no mechanism to know the difference because there's no referent, only pattern. "Two blocks away" is a spatial fact. The model processes it as a token relationship.
The deep learning simulation point is where it gets interesting though. Physics simulators, protein folding, weather models — those systems are grounded in the thing they're modeling. The outputs are constrained by reality, not just by prior outputs. That's a genuinely different epistemic situation. The question I keep coming back to: is the gap between those systems and current LLMs architectural, or is it about what the training signal is coupled to? Because if it's the latter, grounding is a design choice, not a fundamental limit.
I believe it to be architectural.
RLHF is... interesting, in its implications. I find it decidedly ironic that neural networks were proposed and presented as an alternative for rule based programming, to avoid the near impossible complexity of writing rule based programs that could suit every eventuality, as I've heard Geoffrey Hint state multiple times, only to turn around and say: "Well, that worked, but not quite well enough. Maybe it we correct it with rules...".
That's how I think of RLHF: "if output is X, replace with Y" or "if gethumaninput(x>y): x else y". That being the case, I question the premise itself. How far did the neural network architecture for machine learning progress the field towards actual intelligence? How much of what we consider noteworthy and significant about LLMs is actually derived or influenced by human feedback? Of course, guardrails are an essential security feature, so that step cannot be avoided, but my understanding from ablative models is that RLHF cements both the security features and the adherence and coherence of the model, which does beg the question: machine learned, or rule-based?
I suspect that RLHF -- again, ironically -- is actually making the models dumber and less accurate. After a long back and forth with GPT on a complex subject I knew little about, I realized it judged its own output as incorrect. When I pointed this out, it replied: "That's intentional. We're peeling away layers of complexity. The language is imprecise but pedagogically useful." My conclusion? Humans might prefer erroneous information that's understandable to correct information that is not. Hence my suspicion.
But getting back to the point at hand, architectural vs training signal. I believe we've modelled memory. With memory, knowledge. With knowledge, the appearance of reason and intelligence. But human reason functions more like a simulation: a world model based on items and properties, known relationships and assumptions alike, where deconstruction and recombination take place, and trial and error is allowed to occur until the desired eventuality is arrived at. That is why, in my view, AlphaGo, AlphaFold, etc., and the like, is the real breakthrough in the field. Applying the technology to build a world model that isn't based on an abstract symbolic representation of the world -- that being language.
We've essentially distilled an abstraction from an abstraction.
That's a world model that is too far removed from the world it models, for my particular taste.
"Abstraction from an abstraction" is the formulation I've been circling without landing on. Language is already a lossy compression of experience. Training on language trains on the compression, not the thing compressed. AlphaFold works because the training signal is coupled to physical reality — the protein either folds or it doesn't. The feedback is grounded. LLMs get feedback on whether the output satisfies a human, which is a judgment about the abstraction, not the territory.
The RLHF point is sharper than it first looks. If the model learns that humans prefer confident, digestible wrongness over accurate complexity, then RLHF is systematically optimizing away from truth and toward palatability. That's not a guardrail problem — that's the training objective working exactly as designed, just toward the wrong target.
Where I'd push: is the fix a better world model, or a better feedback signal? AlphaGo has a world model but it's narrow by design. Generalizing that architecture requires grounding every domain in something as unambiguous as a Go board or a protein structure. Most of the domains we actually care about don't have that...
Building your own AI bot teaches the messy parts quickly. The demo is easy. The real learning starts with prompts, memory, bad outputs, edge cases, and figuring out where human control still needs to stay in the loop.
Congratulations on shipping your first AI bot, Dan! There is a massive difference between reading about how LLMs work and actually wrestling with the APIs and logic to build something yourself.
Interesting experiment! But that's a rather "humbling" take - that LLMs are doing not much more than regurgitating what we've put into it!
Well, at a basic level - however, I still think (not based on any real deep insight, but on "things I've read") that when you just put enough "stuff" into it (on a gargantuan scale - what the likes of OpenAI, Anthropic, Google etc etc do), and you let the LLM do the "probabilistic recombinatorial" thing, some rather amazing results come out - is that maybe what they call "emergent behavior"?
So they're saying that, at a fundamental level, we don't understand how LLMs work - we do understand the "low level mechanics" (the math and the hardware), but not how it produces its frankly baffling results - for lack of a better explanation, we call it "emergent behavior" ...
However - isn't the same true for our own brains, that fundamentally we don't really understand how it works? Again, we do understand the "low level mechanics" (the neurons and the synapses and so on), but not how it produces its astonishing results - again, we call it "emergent behavior" ...
The parallels are striking - the biggest difference is that AI doesn't have "initiative of its own", while humans do (and I think we should be happy about that) - probably that's because we were driven by evolutionary pressures (survival) to have that 'initiative'? It's kind of inherent in all biological organisms - LLMs don't reproduce, don't evolve, don't try to "survive" - if they would, would that be "AGI"? But it would obviously be scary!
Another difference is that LLMs don't really seem to be capable of "original thought" - or would they (if "scaled up enough")? The litmus test would be whether LLMs could at some point come up with a genuinely new theory (like Einstein's relativity theory, or quantum mechanics) - then again, most humans aren't capable of that either, it requires a rare stroke of genius and a lot of "coincidence" - right time, right place, pre-existing knowledge to work with ...
The emergence parallel is real but I'd push back on the symmetry. With brains, we don't understand the mechanism or the output reliably . The behavior surprises us in both directions.
With LLMs, we understand the mechanism completely and still can't predict the output. That's a different kind of mystery.
The Einstein test is the sharp version of this: relativity required rejecting the dominant framework, not recombining within it. A model trained on Newtonian physics at sufficient scale would produce better Newtonian predictions not special relativity. The gap isn't scale . it's that genuine theoretical breakthroughs require treating anomalies as load-bearing rather than noise to average over.
Well you're right :-)
I was under the impression that (neuro)biologists understand the basic mechanisms, and would be able to fully understand the nervous system of very simple animals (there's a roundworm that has exactly 302 (female) or 383 (male) neurons) - but even that turns out to be currently out of reach - according to Google:
"No, scientists do not fully understand how the nervous system works, even in the simplest animals. While we have extensively mapped the physical wiring of some organisms, a complete structural map does not automatically reveal how the brain computes information to produce complex, dynamic behaviors."
Bummer - so even for that tiny roundworm, its nervous system is too complex for us to fully understand - let alone if you multiply that approximately a 100 million fold to get a human brain! That seems a somewhat "unsolvable" puzzle - I was a bit too optimistic there, lol ;-)
Maybe AI could someday help us solve that puzzle - using an artificial "brain" that we don't understand to help us understand another brain which we also don't understand ;-)
P.S. yeah I also don't believe that current LLMs would ever have the "originality" to invent a totally new theory (relativity etc), even if we'd scale it up by orders of mangnitude - maybe the point is it's "rigid" and lacks the plasticity of a biological brain? I can't prove it but there must be fundamental differences ...
The C. elegans point actually sharpens the original argument. 302 neurons, fully mapped connectome and behavior still isn't predictable from the wiring diagram alone. That's the thing . structure doesn't explain dynamics.
Which cuts both ways: it makes the brain-LLM parallel weaker, not stronger. We can't explain either from first principles but the reason we can't is different in each case.
For LLMs it's the dimensionality of the weight space. For biological systems it's that the map is incomplete and the territory keeps changing. The roundworm's neurons are plastic. The weights aren't.
Yeah you're right, probably the similarities (if there are any) are just skin deep - and I have the feeling that we don't even know what we don't know! ;-)
Spot on! This hits the nail on the head—RAG is essentially about retrieval, not true reasoning. A highly curated, local dataset is the real soul of AI; the model is just the assembly worker. Defending our data sovereignty and becoming true data curators is the ultimate hard-core asset in this era.
This is a solid take on ai. One thing I'm curious about: how does this handle edge cases at scale? We hit some interesting bottlenecks around the 50-agent mark that forced us to rethink the architecture.
Followed you — keen to follow your work on this! 👋
I wouldn't lie about understanding all that you said, but it did gave me closer look at what's actually happening under the hood.
Thanks for the insight 🙏
Glad the under-the-hood framing landed . That's exactly what I was going for...