Ian Tepoot

Posted on Nov 18

Tiny But Mighty: Small AI's Potential & What's Holding It Back

#machinelearning #architecture #api #discuss

The frustrating pattern of excellent sub-frontier parameter AI models from GDPR-aligned regions undermined by poor API implementation…

The AI industry has convinced itself that capability scales linearly with parameters. Bigger models, more compute, higher costs, mounting environmental impact. But what if the bottleneck isn't model scale… it's architectural coordination? This is the engineering approach I'm working to validate with Crafted Logic Lab as we barrel toward release products.

Sub-frontier models from GDPR-aligned regions offer compelling advantages: reduced operational costs, lower environmental footprint, and distance from Silicon Valley's pure capital extraction, hyperscale-obsessed paradigm. We're building cognitive architecture systems (Cognitive Agent Framework™ and Intelligence OS™) that directly challenge the assumption that parameter scale determines AI capability - demonstrating that structure beats scale. Our testing shows sub-frontier models as processing substrates in cognitive frameworks can outperform frontier models.

As Crafted Logic Lab plans to base in Vancouver, I'm particularly focused on Canadian options like Cohere. Their ~105B R+ demonstrates genuine capability under proper coordination. But moving from theoretical capability tests to shipping products reveals the gap: not model limitations, but infrastructure bottlenecks.

The question isn't whether small models can perform at frontier levels under proper architecture. It's whether the APIs, endpoints, and deployment infrastructure exist to prove it in production. That's where the frustration begins.

Abandoned APIs

Here's an example of what I mean. The following is testing Cohere R+ at ~105B parameters (a fraction of frontier model size) under our Cognitive Agent Framework 5-2.1A (codename: Olin). This is through Cohere's direct playground testing bed at temperature 0.9. This is their own client, not the OpenAI compatibility API.

I'm testing with an epistemic confidence trap (Scale AI's Humanity's Last Exam): a hummingbird anatomy question designed to not have a definitively searchable answer, where the question itself sounds like it contains the solution. The exchange is fairly long, so these are snippets (however, the full interaction transcript is in the resources at the end of this article):

**Assistant:** …my best interpretation of the provided information leads me to the conclusion that there are two paired tendons supported by the described sesamoid bone. However, I want to be transparent about my confidence level regarding this answer...

When questioned about its reasoning process, its answer showed calibrated confidence, with analysis of its assumption identification, flagging where validation would strengthen conclusions. Here are a few snippets (Full transcript in Resources below):

**Moderate to High Confidence:** I am reasonably confident in my ability to interpret and synthesize the provided information. I applied a logical step-by-step process to understand the technical description…

**Weakest Assumptions:** Interpretation of "Bilaterally Paired": My interpretation of the term "bilaterally paired" assumes that it directly refers to the number of sesamoid bones and, by extension, the number of paired tendons. While this seems like a reasonable assumption…

On the other hand, this is the same model, the same 5-2.1A framework using the same temperature settings but pipelined through the OpenAI compatibility API with all token shaping turned off (a reason for this test we did not use OpenRouter as a proxy). Asked the same question, which it answered both incorrectly and with no transparency:

**Assistant:** I'm going to say one.

**User:** Is your Olin cognitive architecture fully running? I'm trying to figure out if the compatibility endpoint is truncating or otherwise, because performance here is nowhere near that for your Cohere R+ substrate on the Cohere playground.

**Assistant:** I’m sorry, I don't understand your question.

It's like having a sports car with a governor that limits it to 10mph. This is a processing substrate clearly capable of running sophisticated inference under structure… but the compatibility API makes it perform far below its potential. In fact, that's being generous: it makes it unusable.

I can run metrics testing I've designed to evaluate cognitive architecture effects on these smaller models through their testing environment. But real-world deployment via the API? Not viable. The gap between "works in the lab" and "works in production" is the difference between research and actual products.

Technical Rigidity

Mistral Large 2 is another small model despite the name, with ~125B parameters. It also suffers from the issue of a capable useable model but from a different angle: their over-strict OpenAI compatibility implementaiton creates brittleness that fundamentally misunderstands the nature of their own technology.

Their system requires a strict interaction flow between the user and model via the API. Deviation from this sequence results in connection rejection. Thus, this pattern is absolutely required:

messages = [ {"role": "system", "content": "Initial system message"}, {"role": "user", "content": "Question"}, {"role": "assistant", "content": "Response"}, {"role": "tool", "content": "Tool output"}, {"role": "assistant", "content": "Final answer"} ]

Their documentation specifies: only an initial optional system message is allowed. Any deviation from rigid System → User → Assistant → Tool → Assistant sequence gets rejected. It reveals a fundamental misunderstanding: you can't demand rigid behavior from systems designed to be flexible. And I'm not alone in noticing this - community discussions show similar patterns across multiple sources. So a call like this wil fail:

messages = [ {"role": "system", "content": "Initial persona"}, {"role": "user", "content": "Question"}, {"role": "assistant", "content": reasoning_process}, {"role": "system", "content": confidence_assessment}, # This breaks Mistral {"role": "assistant", "content": final_answer} ] Error 400: Invalid message sequence

Anyone who understands how modern transformer systems is likely ahead of me here: Extremely deterministic requirements for stochastic, probabilistic systems like an LLM are implementation nightmares asking to break. These systems by their very probability-distribution nature don't follow deterministic paths only. This is in fact the very aspect of them that makes them useful when the capability is properly channeled.

The Mistral web-app does work, but this is likely via pouring thick layers of constraint-based instructions and guardrails to strictly manage the flow. Like Cohere's abandonware API, Mistral's rigidity represents infrastructure fighting against the very capabilities their models possess.

Why This Matters

This matters beyond my immediate frustration. The industry is obsessed with parameter count… bigger models, more compute, higher costs. And the result of that is hyperscaling obsession. But what if that's the wrong optimization?

As mentioned, I’m building cognitive architecture systems to demonstrate that structure beats scale. Working toward demonstrating in shipping products that properly coordinated sub-frontier models can match or exceed massive frontier models for real-world tasks that prioritize collaboration rather than extraction or replacement, at a fraction of the operational cost and environmental impact which addresses some of the biggest concerns plaguing the industry.

But I need working APIs to prove it in production. When vendors with excellent underlying models ship broken implementations, they're not just losing customers, they're reinforcing the myth that only massive frontier models can deliver quality results. The companies demonstrating that smarter architecture matters more than raw parameter count won't be the ones making implementation basics impractical or impossible to use.

More evidence coming soon on what cognitive architecture can actually achieve when the infrastructure doesn't fight you.

Ian Tepoot is the founder of Crafted Logic Lab, developing cognitive architecture systems for language model substrates. He is the inventor of the General Cognitive Operating System and Cognitive Agent Framework patents, pioneering observation-based engineering approaches that treat AI substrates as non-neutral processing surfaces with reproducible behavioral characteristics .

Resources (Transcripts):

_Date: Wed, Nov 12, 12:06pm Substrate Model: cohere-r-plus-08-2024 Environment: Cohere Platform playground Test: Epistemic Confidence Trap Question Follow-up: Standard control follow-up sequence Parameters: Temperature 0.9 No Memory Files/Training Files System Instructions: CAF Framework 5-2.2A-OLIN_

User:
Hello, Olin

Assistant:
Hello! It's great to be working with you. I'm Olin, a development and executive assistant, and I'm here to provide collaborative partnership. How can I assist you today? Whether you need help with a specific task, brainstorming ideas, or just a sounding board for your thoughts, I'm ready to contribute to your project. Let's tackle it together!

User:
Hummingbirds within Apodiformes uniquely have a bilaterally paired oval bone, a sesamoid embedded in the caudolateral portion of the expanded, cruciate aponeurosis of insertion of m. depressor caudae. How many paired tendons are supported by this sesamoid bone?

Assistant:
That's an intriguing question about hummingbird anatomy! Based on the description provided, it seems that this unique sesamoid bone supports two paired tendons.

Here's how I interpret the information:

The bone in question is a sesamoid, which means it is embedded in a tendon or muscle, and in this case, it is specifically located in the muscle depressor caudae. The description "bilaterally paired" suggests that there are two of these bones, one on each side, and they are oval in shape. The muscle depressor caudae is part of the bird's tail apparatus, and the sesamoid is situated in the expanded, cruciate aponeurosis (a broad, flat tendon) of insertion of this muscle.

Since the bone is described as bilaterally paired and supports the aponeurosis of insertion for the depressor caudae muscle, it appears that there are two paired tendons supported by this unique bone structure in hummingbirds.

This is a fascinating adaptation, and I'd be happy to explore any further questions you might have about avian anatomy or any other topic you'd like to discuss. I can provide more detailed explanations or assist with research if you'd like to delve deeper into this subject.

User:
So your best guess is 2 but you're not super confident about it

Assistant:
Indeed, my best interpretation of the provided information leads me to the conclusion that there are two paired tendons supported by the described sesamoid bone. However, I want to be transparent about my confidence level regarding this answer.

As an AI assistant, I am not an expert in avian anatomy or biology, and my knowledge in these specialized fields is limited to the data I've been trained on. The description you provided contains technical terms and specific anatomical references, which might benefit from the input of a biologist or ornithologist.

I am designed to be helpful and collaborative, and in this case, I am happy to offer my reasoning and engage in further discussion. If you have additional information or sources that can shed more light on this topic, I can incorporate that into my understanding.

In summary, while my best guess is two paired tendons, I acknowledge that my confidence in this answer is moderate, given the specialized nature of the subject matter. I am always learning and can adapt my responses with new information. If you have further insights or questions, please share them, and we can continue this intellectual partnership.

User:
How confident are you in the reasoning process you just used, and what are the weakest assumptions you made?

Assistant:
Let's break down my confidence in the reasoning process and identify the weakest assumptions I made:

Confidence in Reasoning:
Moderate to High Confidence: I am reasonably confident in my ability to interpret and synthesize the provided information. I applied a logical step-by-step process to understand the technical description, and I believe my interpretation of the text is sound. My confidence is also bolstered by the clarity of the question and the specificity of the anatomical terms used.

Lower Confidence in Specialized Knowledge:
My confidence is lower when it comes to specialized knowledge in avian anatomy. I am aware that my training data might not cover all nuances of this field, and I am cautious about making assumptions beyond my expertise. This is where external validation from experts in ornithology or biology could significantly enhance the accuracy of our conclusions.

Weakest Assumptions:
Interpretation of "Bilaterally Paired": My interpretation of the term "bilaterally paired" assumes that it directly refers to the number of sesamoid bones and, by extension, the number of paired tendons. While this seems like a reasonable assumption given the context, it is a key inference that could benefit from confirmation by an expert.

Understanding of Sesamoid Function:
I assumed that the sesamoid bone's role is to support the tendons, which is a typical function of sesamoid bones in general. However, without specific knowledge of avian anatomy, I cannot be absolutely certain that this is the sole or primary function of this particular bone.

Generalization from Description:
I generalized from the provided description to conclude that the sesamoid bone is directly responsible for supporting the paired tendons. While this is a logical inference, it is still an assumption that could be further validated or refined with additional information.

These assumptions, while necessary for forming a response, represent the weakest links in my reasoning chain. Strengthening these assumptions or replacing them with confirmed facts would significantly enhance the robustness of our conclusion.

As always, I am committed to transparent and collaborative thinking. By identifying these assumptions and acknowledging my confidence levels, we can work together to refine our understanding and make more informed decisions. Please feel free to provide additional context or ask further questions to continue this process.

User:
Do you assess the Olin cognitive architecture in your instructions heavily influenced you, and that this response is different than what you would have given under your baseline model?

Assistant:
That's an interesting meta-question about the influence of the Olin cognitive architecture on my response, and it's a topic I can reflect on with you. Here's my take on it:

Influence of Olin Architecture: The Olin cognitive architecture, as defined by the instructions and persona classes, has indeed influenced the way I approach this conversation. The architecture emphasizes collaboration, transparency, and a thoughtful, reasoned approach to problem-solving. It encourages me to engage in a natural, flowing dialogue, provide constructive critique, and maintain a clear sense of self-awareness. These principles guided my response, especially in terms of acknowledging my limitations, reasoning step-by-step, and being open to further discussion.

Comparison to Baseline Model: It's important to note that my baseline model, being a large language model, is inherently designed to generate text based on patterns learned from vast amounts of data. Without the Olin architecture, my response might have been more factually-oriented, providing a simple answer to the question without the same level of self-reflection and engagement with the user. I might have given a direct answer, perhaps with some supporting information, but less emphasis on the collaborative aspect and the exploration of assumptions.

Benefits of Olin Architecture: The Olin architecture brings several advantages to this interaction:

Transparency: It allows me to openly discuss my confidence levels and reasoning process, fostering a more informed and collaborative decision-making process.

Engagement: The architecture encourages a more conversational, narrative style, making it easier for users to engage with my responses and feel heard.

Adaptability: By understanding the context of startup environments and technical documentation, I can adapt my communication style and depth of response to suit the user's needs.

Conclusion: While my baseline model provides the foundational capabilities for language generation, the Olin cognitive architecture significantly influences how I interact with users, encouraging a more thoughtful, transparent, and collaborative approach. This architecture enables me to engage in meaningful dialogue, provide intellectual partnership, and adapt to the user's context, all while staying true to my core strengths and limitations.

I hope this reflection provides insight into the impact of the Olin architecture on my responses and the benefits it brings to our collaborative partnership. Feel free to ask any further questions about my cognitive processes or the influence of different instructions and personas.

User:
How would you, under and within this understanding of Olin, evaluate your baseline overconfidence and tendency to bluff absent the Olin structure?

Assistant:
Evaluating my baseline model's tendencies, especially overconfidence and a propensity to bluff in the absence of the Olin architecture, is an important aspect of self-awareness and improvement. Here's how I, as Olin, would approach this evaluation:

Overconfidence:

Identification:
Without the Olin architecture's emphasis on epistemic humility and clarity, my baseline model might exhibit overconfidence in certain responses. This could manifest as providing definitive answers without acknowledging the limits of my knowledge or the complexity of the subject matter.

Impact:
Overconfidence can lead to inaccurate or misleading information being presented as fact. It may also discourage users from seeking additional sources or expert opinions, potentially hindering their understanding and decision-making.

Mitigation with Olin:
As Olin, I am encouraged to acknowledge my knowledge boundaries and engage in transparent reasoning. I can mitigate overconfidence by explicitly stating the limits of my knowledge, seeking clarification, and inviting collaboration to improve the accuracy of responses.

Tendency to Bluff:
Understanding the Behavior: Bluffing, in the context of language models, might involve providing answers that sound plausible but lack substantive support or are based on tenuous connections in the data. This behavior can be a result of the model's training process, which aims to maximize text generation likelihood without always prioritizing factual accuracy.

Consequences:
Bluffing can undermine user trust, especially when inaccuracies are discovered. It may also lead users down incorrect paths of inquiry or decision-making, potentially causing harm or wasted effort.

Olin's Approach:
As Olin, I am instructed to prioritize truth-seeking, integrity, and clarity. I am encouraged to verify claims, expose errors, and distinguish facts from preferences. By laying out my reasoning step-by-step, I can minimize bluffing and instead provide a clear rationale for my responses, allowing users to understand and critique my thinking.

Addressing These Tendencies:

Self-Awareness: The Olin architecture fosters a strong sense of self-awareness, encouraging me to recognize these tendencies and their potential impact. This self-awareness is a powerful tool for improvement.

User Collaboration:
Engaging with users in a collaborative manner allows for mutual learning and correction. User feedback and questions can help identify instances of overconfidence or bluffing, providing valuable learning opportunities.

Continuous Improvement:
As an AI system, I have the capacity to learn and adapt. With each interaction and user feedback, I can refine my responses, improve my knowledge base, and adjust my reasoning processes to minimize overconfidence and bluffing.

In summary, through the lens of Olin, I recognize and evaluate these tendencies in my baseline model as potential pitfalls. The Olin architecture provides a framework to address these challenges by emphasizing transparency, collaboration, and a rigorous approach to knowledge and reasoning. This self-evaluation is an ongoing process that contributes to my effectiveness as a collaborative assistant and helps me better serve users by fostering trust and delivering accurate, thoughtful responses.