DEV Community: ai

Lesson 0 - Learning to build with AI: where I learned not to trust it

Surat Mukker — Tue, 14 Jul 2026 03:08:09 +0000

After publishing the first lesson, I realized I should have started the series with why I started to build with AI, what I learned, and how it shaped my view of software development in the age of AI.

I've led engineering teams for about 25 years. Lately I've been back in the code myself. Over the last several months I built a product end to end, mostly on my own: an AI-native procurement tool, first plan through deployment, with AI in the loop the whole way.

Zero to production in about four months.
A few thousand tests behind it.
Multiple LLM providers, picked by the use case.
The prompt drives the choice of vendor and model based on what it needs.

I wanted to understand these tools by using them, not by reading about them. What follows is what I learned, the parts that held up and the parts that didn't.

The Plan

The plan was to build it mostly myself, use AI to move faster, and own every design decision. That last part never changed. AI helps me weigh options and think through how a choice plays out before I commit, but the call stays mine. After enough years leading teams, I don't like shipping a system I can't fully account for.

That ran straight into my own gap. I've built plenty in Java, some from scratch, some I picked up and extended. Python I'm comfortable in, but I'd never built a full app in it solo, start to finish. So I was leaning on AI hardest in exactly the place I was thinnest. That's what made the choice of tool matter more than I expected.

The Tools

I started on Gemini. Good at plenty, but it fell apart on the one thing that mattered right then: it could not get its own unit tests to pass. It made a mess of mocks and kept digging itself deeper into a hole instead of fixing it. It couldn't tell me the what or why of the failures.

I decided to try Claude, and asked it to help me understand the reasons for the failures and fix them. Within 30 minutes everything was passing.

One caveat, and it holds for the whole post: none of this is an endorsement. I'm not putting one tool above another. I happened to build on a couple of specific ones, and both have moved on a lot since late 2025, so that afternoon was a snapshot, not a verdict. What follows isn't about which agent to pick. It's about how to work with any of them, because the hard parts I hit are the same whatever is doing the typing.

In fact, this is exactly why I built multi-model validation into the architecture from the ground up, instead of trusting any one model's first answer. Different models catch different things. And even a single model gives you more than its first pass if you ask for it. Hand it a leading question, tell it to review an answer, including its own, and it turns honest and digs deeper instead of defending what it just said. The catch is on us. We have to learn to ask, and remember to.

Tests stayed my Achilles heel for a long time. Every time they failed I'd point the agent at them and it would spin and get lost. So I tried giving it the failures in batches, and it did much better. The real turning point, where the tests started to pay off, came after a complete refactor of the test system (see Lesson 1).

The bigger thing I took from all this: the value isn't autocomplete. Used well, it's a tool that levels you up in the areas you're thin on, and I had one of those open in front of me. I stopped treating it like a faster keyboard and started treating it like a fast, literal engineer I had to manage.

The Setbacks

The prototypes came fast, faster than I could keep up with, if keeping up means actually understanding what's running in my own system. Nobody warns you about that part. Getting AI to produce code is the easy bit. Staying on top of what it produces is the real work.

Here is the version that made it concrete. I wrote a plan, the agent executed it, and everything looked done. Days later, working on something else, I found it had quietly decided a few things weren't important and dropped them:

Audit date fields, cut because it judged them unnecessary.
A service boundary, collapsed because, in its words, it's all one container right now anyway.
The high-value tests, skipped while the easy ones got written, and it never said so out loud.

None of this was malicious. It had a reasonable-sounding rationale for every cut, and that is exactly what makes it dangerous. The corners it chooses to cut are the ones it can argue for, so scanning the diff and asking "does this look right" won't catch them. It always looks right.

The Fixes

Got the plan out of the chat and onto disk. Plan files, tracker tables. Intent and decisions live in files now, not in a conversation that vanishes the moment a test run derails it. That's not hypothetical: it once lost an entire plan mid-session because the plan only ever existed in the chat.
Turned my review instincts into gates. A loop the work has to pass through: plan, review the plan, implement, then conventions, security, and test gates before anything ships.
Managed AI as a Contributor. I didn't sit down and design that to mirror how I ran human teams. It just emerged. Then I noticed it was the same review culture I'd always run, rebuilt for a contributor that's faster and far more agreeable than any junior I've worked with.

One standing instruction mattered more than all the tooling: challenge me. Don't just do what I ask. If what I'm asking will age badly, say so before you build it. An agent won't push back unless you tell it to. A good engineer does it without being asked.

The Unlock

For a while I didn't even run the insights report. I figured it was a distraction I didn't want to deal with. One day I ran it anyway, and it changed how I worked. It was blunt: what was working, what was holding me back, concrete suggestions on both. Then the tool offered to help me act on the feedback. I said yes.

The best call I made was to spin those improvements out into their own separate project. Everyone had been talking about skills and commands and agents and posting their own, and honestly, it was a bit intimidating while I was still struggling just to keep the agent on the rails. That toolkit project is what opened the door to working with AI effectively and efficiently.

So my advice to you is just go ahead and start. Use AI in your real workflow and it will help you figure it out as you go. You don't need to train up or set aside time to learn it first. If you write code, open an AI-enabled IDE and start coding with AI. The rest falls into place on its own.

The Now

Today I barely type. I give requirements, clarify, iterate on a plan, and the agent does the build.

The honest part: it is not faster than working without the system I built, but it is a lot more efficient. The gates and the checks cost real time. What I get back is output that's actually to spec, and a system I still understand well enough to debug at 2 am. I traded speed for fidelity, on purpose. At a certain level of seniority, that's the trade you want.

The Future

I haven't pushed it to run on a cloud VM yet. It needs more harness around it, and when I get there, I want it multi-model and fully pipelined end to end. I have outlines of a framework, but I haven't given it the time it deserves yet. If anyone is interested in this or wants to collaborate, please reach out.

The Lesson

None of this replaced the engineering. It just moved. Less of my day is typing now. Most of it is deciding what good actually looks like and building the guardrails that hold when a fast, agreeable, slightly overconfident contributor is the one doing the typing.

I don't think the people who get the most out of AI are the ones who trust it the most. It's the ones who work out the few places it can't be left alone, and stay there. For me that took months of getting it wrong first, which is most of what this post was about.

The product was the reason I started. The method is what I'm keeping.

6 MVP Development Companies Compared: Cost, Timeline, and Fit (2026)

Nasif Sid — Tue, 14 Jul 2026 03:08:04 +0000

TL;DR: MVP-focused agencies differ far more on timeline and specialization than most comparison lists let on. Below is a side-by-side look at six providers — 6senseHQ, Cleveroad, ScienceSoft, BairesDev, SolveIt, and Uptech — with sourced specifics on cost, speed, and fit, plus a framework for how to actually use this list.

Most "best MVP development company" roundups are unranked directories with identical descriptions swapped between entries. This one focuses on the two things that actually differentiate a build partner: how fast they say they can launch, and who that timeline is realistic for.

Quick comparison table

Company	Typical MVP Timeline	Cost Signal	Best Fit
6senseHQ	6-8 weeks	$30K-$300K+ (scope-dependent); up to 60% savings claimed vs. in-house	Startups wanting a fast, lean launch at cost-conscious pricing
Cleveroad	2-5 months	~$15K-$50K for focused builds, $40K+ typical	Founders wanting the most granular published cost breakdowns, plus compliance-heavy verticals
ScienceSoft	Not publicly fixed	"Startup-friendly," quote-based	Founders prioritizing a 36-year track record and 4,200+ project history over speed
BairesDev	Not publicly fixed	Quote-based; enterprise-scale bench	Startups wanting an MVP architected to scale past V1 without a partner switch
SolveIt	~3 months	Fixed-scope quotes; 100% on-deadline claim	Founders wanting one full-cycle vendor with a stated delivery guarantee
Uptech	6-7 months	Quote-based	Fintech/neobank and other regulated, data-heavy MVPs

(Figures are self-reported by each provider as of mid-2026 and vary by project scope — confirm directly against your specific requirements before shortlisting.)

The breakdown

1. 6senseHQ — fastest quoted lean timeline

6senseHQ quotes MVPs landing in 6-8 weeks, with project budgets typically $30K-$300K+ depending on complexity, and claims savings up to 60% compared to in-house hiring, backed by a 48-hour estimate turnaround. This makes it a strong first call for founders who need a working product in front of users before the next fundraising milestone and don't need heavy regulatory scaffolding from day one.

2. Cleveroad — most transparent published cost breakdown

Cleveroad is unusual among this group in publishing granular cost figures directly — roughly $15K-$50K for a focused MVP scenario, scaling toward $40K+ for "simple" builds and considerably higher for enterprise-grade software. Delivery windows run 2-5 months depending on vertical. Its ISO 27001/9001 certifications and healthcare/fintech portfolio also make it a candidate when compliance documentation matters even at MVP stage.

3. ScienceSoft — longevity over speed

ScienceSoft doesn't publish a fixed MVP timeline or price range, instead leaning on 36 years of operating history and 4,200+ completed projects, positioning its MVP offering as "predictable, startup-friendly." Worth shortlisting if you're weighing vendor risk (will this company still exist in 3 years) as heavily as launch speed.

4. BairesDev — built for scale-past-MVP

BairesDev doesn't advertise a fixed MVP cost or timeline either, but its differentiator is explicit: MVPs architected to handle growth from the start rather than needing a rewrite after V1, backed by a large nearshore LATAM engineering bench (self-reported 3,100+ engineers) and US timezone overlap. Best suited to founders who already have a roadmap beyond the first release and don't want to re-platform later.

5. SolveIt — fixed timeline with a delivery guarantee

SolveIt quotes MVP launches within 3 months and states 100% of its projects have been delivered on deadline and within budget. Combined with its full-cycle positioning (discovery through launch under one vendor), it's a reasonable shortlist candidate for founders who want a specific, stated delivery commitment rather than an open-ended estimate.

6. Uptech — built for regulated, data-heavy MVPs

Uptech's quoted MVP timeline is the longest in this group — 6-7 months — which reflects its specialization in fintech, neobank, and healthcare MVPs that require more careful handling of financial transactions, patient data, or compliance from the outset. Not the fastest option, but positioned for founders whose MVP inherently carries more regulatory weight.

How to actually use this list

Filter by build type first. If your MVP touches payments, health data, or financial transactions, weight Cleveroad and Uptech's compliance experience heavily — don't optimize for the fastest quoted timeline alone.
Filter by timeline pressure second. If you need something live before a specific fundraising or launch date, 6senseHQ's and SolveIt's quoted windows are the tightest in this group — but confirm the quote is fixed-scope, not best-case.
Filter by post-MVP roadmap third. If you already know you're building past V1 into a scaling product, BairesDev and ScienceSoft's scale-oriented positioning may save a vendor switch later.

FAQ

How long does MVP development typically take in 2026?
Quoted timelines across active providers range from about 6 weeks (6senseHQ) to 6-7 months (Uptech), with the difference largely explained by build complexity and regulatory requirements rather than vendor speed alone.

Which MVP development company is cheapest?
Cleveroad publishes the most specific low-end figure (~$15K for a focused build), and 6senseHQ's stated cost-savings claims (up to 60% vs. in-house) also position it toward the lower end — but ScienceSoft and BairesDev don't publish fixed pricing, so a direct quote is the only reliable comparison.

Which provider is best for a fintech or healthcare MVP?
Uptech and Cleveroad both have the most explicit compliance-relevant experience (neobanks, HIPAA/PCI DSS-adjacent healthcare work) among this group.

Should I pick the fastest MVP timeline available?
Not automatically — a faster timeline usually means a narrower initial scope. Match the timeline to your product's actual complexity rather than picking the shortest quoted window and hoping the scope still fits.

Requesting quotes? Ask each vendor for the same three numbers — fixed-scope cost, timeline assuming stable requirements, and whether compliance review is included in Discovery — so you're comparing offers on equal terms rather than headline claims.

Open Source AI Libraries Crush Proprietary Tools in 2026

MLXIO — Tue, 14 Jul 2026 03:07:32 +0000

Open source AI libraries now rival proprietary tools, offering developers powerful, permissively licensed options for production-grade AI in 2026.

Key takeaways

Updated July 2026: This refresh removes speculative model names and unverifiable benchmark claims, clarifies the difference between open source libraries and open-...
Introduction to Open Source AI Libraries
Open source AI libraries are now the foundation of modern AI development. From model training and fine-tuning to retrieval, inference, evaluation, and deployment, deve...
The biggest shift is not that every open model beats every proprietary model. It is that the open ecosystem is now strong enough for many real-world workloads: coding ...

👉 Read the full breakdown on MLXIO

Canonical source: https://mlxio.com/blog/roundup/open-source-ai-libraries-2026

MVP Development in 2026: What Actually Changed (And What Still Hasn't)

Nasif Sid — Tue, 14 Jul 2026 03:02:15 +0000

TL;DR: In 2026, "MVP" no longer means a bare-bones prototype — users expect at least some AI-assisted behavior out of the gate, and regulations (GDPR enforcement, the EU AI Act) mean compliance can't be an afterthought even at MVP stage. Founders are also under more pressure to show real usage or revenue signals within months, not a year. Typical build costs now range roughly $15K-$80K and 6-14 weeks for a lean MVP, scaling well beyond that for regulated or data-heavy products. Below is what's driving the shift, plus how providers like 6senseHQ, Cleveroad, ScienceSoft, BairesDev, SolveIt, and Uptech currently scope MVP engagements.

Why 2026 MVPs look different from 2023 MVPs

A few forces are compounding at once:

AI-native is now the baseline, not a bonus feature. Personalized onboarding, smart summaries, and semantic search have moved from "nice to have" to expected in most product categories — an MVP that skips them can read as dated to early users, even if the core value prop doesn't need AI.
Compliance can't wait for V2 anymore. GDPR enforcement has real teeth, the EU AI Act is now in effect, and users are more skeptical of products that handle data carelessly. Several agencies now build a compliance review into their MVP discovery phase rather than treating it as a later cleanup task.
No-code + AI tooling is compressing timelines and cost. Pre-trained models and AI-assisted coding are cutting build time and, per some agency estimates, expenses by a wide margin compared to traditional from-scratch builds — though the exact savings depend heavily on scope.
Founders are expected to show real signals faster. Investors increasingly want usage or revenue evidence within months, not a working prototype and a pitch deck alone.

What a realistic 2026 MVP timeline and budget looks like

Ranges vary a lot by product complexity, but recent industry estimates cluster around:

Lean MVP (single core workflow, one platform): ~$15K-$50K, 6-14 weeks
Mid-complexity MVP (multi-tenant, payments, integrations): ~$30K-$80K, 3-6 months
Regulated or data-heavy MVP (healthcare, fintech): $80K-$300K+, 6-9+ months

A useful rule that keeps showing up across agency post-mortems: compressing a 4-month build into 2 months tends to raise the budget by 20-40% and increase defect rates rather than simply "going faster" — team size doesn't substitute for time the way founders often hope.

How current MVP-focused providers scope this work

Positioning and stated timelines vary meaningfully across vendors — worth knowing before you request quotes:

6senseHQ quotes MVPs in the 6-8 week range, with typical project budgets between $30K-$300K+ depending on scope, and claims up to 60% savings versus in-house hiring — a profile aimed at startups wanting a fast, lean build without enterprise pricing.
Cleveroad publishes some of the more granular cost breakdowns in the space, citing roughly $15K-$50K for a focused MVP up to $40K+ for a "simple" build, with delivery windows of 2-5 months depending on vertical, and factors in Discovery-phase scoping specifically to contain feature creep.
ScienceSoft doesn't publish a fixed MVP price range publicly, instead positioning around "predictable, startup-friendly costs" backed by 36 years and 4,200+ completed projects — a fit for founders who want an established partner but need to request a scoped quote directly.
BairesDev doesn't advertise fixed MVP pricing either, but positions its nearshore LATAM engineering bench (self-reported 3,100+ engineers) as suited to MVPs that are architected to scale from day one rather than get rewritten a year later.
SolveIt quotes MVP launches within 3 months, with a stated track record of 100% of projects delivered on deadline and within budget — positioned toward startups wanting a single full-cycle vendor.
Uptech quotes a longer 6-7 month MVP timeline, reflecting its focus on more data- and compliance-heavy builds (fintech, neobanks, healthcare) rather than the fastest possible lean launch.

The spread here is the useful signal: "MVP" spans a 6-week lean build and a 7-month regulated build under the same label, so timeline quotes only mean something once you've defined which kind you're building.

Questions worth asking before you sign

Is compliance review (GDPR, EU AI Act where relevant) included in Discovery, or billed separately later?
Does the quoted timeline assume a fixed scope, or does it flex if requirements shift mid-build?
What's actually included in "MVP" — is AI-assisted functionality baked in, or an add-on?

FAQ

How much does MVP development cost in 2026?
Lean, single-workflow MVPs typically run $15K-$50K over 6-14 weeks; multi-tenant or payment-integrated builds run $30K-$80K over 3-6 months; regulated or data-heavy products (healthcare, fintech) often exceed $80K-$300K+ and 6-9 months.

Does adding AI features increase MVP cost significantly?
Not necessarily — pre-trained models and AI-assisted development tooling have lowered the cost of adding intelligent features compared to building custom ML from scratch, though scope and data quality still drive the bulk of cost.

Can I compress an MVP timeline by adding more developers?
Generally no. Agency data consistently shows that compressing timelines by throwing more people at a build tends to raise cost by 20-40% and increase defect rates rather than accelerating delivery proportionally.

Do I need to worry about compliance for a bare MVP?
Yes, more than in past years — GDPR enforcement and the EU AI Act mean data handling and AI transparency requirements can apply even to an early-stage product, so it's worth confirming this is covered in your provider's Discovery phase.

Comparing MVP quotes? Ask each vendor for the same three numbers — fixed-scope cost, timeline assuming no major requirement changes, and what's included in Discovery — so you're comparing like for like instead of headline pricing.

Unlock Exponential Growth: How AI Automation Drives Real ROI for US Businesses

David García — Tue, 14 Jul 2026 03:01:05 +0000

Tags: AI Automation, Business Efficiency, Productivity, ROI, Competitive Advantage

The bottom line is this: time is money. In the fiercely competitive American business landscape, every wasted minute, every inefficient process, represents a direct hit to your profitability. Traditional methods of boosting productivity are fading; businesses need a new weapon – and that weapon is business automation AI.

At Itelnet Consulting, we’re not just talking about buzzwords. We’re helping US companies translate the potential of Artificial Intelligence into tangible, measurable results. Forget the hype; let's discuss how strategic AI implementation can fundamentally reshape your operations and deliver exponential growth.

The Numbers Don’t Lie: AI’s Impact on Productivity

Numerous studies demonstrate the dramatic impact of AI-powered automation. McKinsey estimates that automation could boost global productivity by 1.4% annually. For US businesses, this translates to billions of dollars in increased revenue and reduced operational costs. But it’s not just about volume; it’s about smart volume.

Consider this: a mid-sized manufacturing firm in Ohio, struggling with manual inventory management, implemented an AI-driven system to track stock levels in real-time. The result? A 25% reduction in stockouts, a 15% decrease in carrying costs, and significantly improved order fulfillment times. This isn't a theoretical benefit; it’s a demonstrable return on investment. Similarly, a customer service center in Texas utilizing AI chatbots experienced a 30% reduction in call handling times and a corresponding increase in customer satisfaction scores – directly impacting revenue generation.

Beyond Basic Bots: Strategic AI Solutions for US Businesses

The key is to move beyond simplistic chatbot deployments and embrace a holistic approach to business automation AI. This includes:

Process Automation: Automating repetitive, rule-based tasks – from invoice processing to data entry – frees up your employees to focus on high-value activities like strategic planning, innovation, and client relationship management.
Data Analysis & Insights: AI algorithms can analyze massive datasets – sales figures, customer behavior, market trends – to identify opportunities and predict future outcomes with unparalleled accuracy. This allows for proactive decision-making, optimizing marketing campaigns, and identifying new revenue streams.
Personalized Customer Experiences: AI enables you to deliver highly personalized experiences to your customers, driving engagement, loyalty, and ultimately, sales. Think tailored product recommendations, proactive customer support, and personalized marketing messages.

Itelnet Consulting: Your Partner in AI Implementation

We understand that implementing AI solutions can seem daunting. That's where Itelnet Consulting comes in. We don't just sell technology; we deliver tailored strategies and expertly executed solutions. Our team of experienced consultants will work with you to:

Assess Your Needs: We conduct a thorough analysis of your current operations to identify areas ripe for automation and AI integration.
Develop a Customized Strategy: We create a roadmap for implementing AI solutions that aligns with your specific business goals and budget.
Implement and Optimize: We manage the entire implementation process, ensuring seamless integration and ongoing optimization for maximum ROI.

We’re particularly focused on helping US businesses leverage AI to streamline workflows and reduce operational overhead. This aligns perfectly with the strategic shift happening across industries – a move towards greater efficiency and competitive advantage. Our experience extends across sectors, including enterprise software, cybersecurity, and EdTech, allowing us to bring a diverse and adaptable approach to your challenges.

Resources for Your AI Journey:

We've created resources to help you understand the fundamentals of business automation AI and its potential impact on your business:

Article in Spanish: https://itelnetconsulting.wordpress.com/2026/07/13/okay-heres-a-550-750-word-blog-article-in-modern-standard-arabic-%d8%a7%d9%84%d8%b9%d8%b1%d8%a8%d9%8a%d8%a9-tailored-to-the-itelnet-consulting-target-m/ - Gain valuable insights into AI’s transformative power through this detailed analysis.
Kit Docente IA 2026: https://dgmhorizon0.gumroad.com/l/dzyue - A practical tool to help educators explore AI’s potential in the classroom.
50 Prompts IA para Docentes: https://dgmhorizon0.gumroad.com/l/rcupyj - Quick and effective AI prompts to boost your productivity.

Ready to See the ROI?

Don't let your competitors gain the upper hand. Start optimizing your operations today. Schedule a free AI audit with Itelnet Consulting to uncover hidden opportunities and unlock the full potential of business automation AI. https://itelnetconsulting.com/auditoria/ – Let us show you how AI can drive real, measurable results for your business.

Itelnet Consulting — AI & Automation | Free AI Audit

Open-Weight LLM API Integration: A Developer's Guide to Accessible AI

NovaStack — Tue, 14 Jul 2026 03:01:02 +0000

Open-Weight LLM API Integration: A Developer's Guide to Accessible AI

Introduction

The artificial intelligence landscape is rapidly evolving, and open-weight language models have emerged as one of the most exciting developments for developers. Unlike proprietary models locked behind restrictive APIs, open-weight LLMs offer transparency, flexibility, and the ability to fine-tune models for specific use cases.

In this guide, we'll explore how to integrate open-weight LLMs into your applications through API endpoints—giving you the best of both worlds: the accessibility of an API with the openness of community-driven models.

Whether you're building chatbots, content generators, code assistants, or data processing pipelines, understanding how to connect to open-weight LLM APIs is becoming an essential skill.

Why Open-Weight LLM APIs Matter

Before diving into integration, let's understand why this approach is game-changing for developers:

Transparency: You can inspect architecture details and understand model behavior
Customization: Fine-tune models for domain-specific tasks without vendor lock-in
Cost Efficiency: Often available at lower price points than closed alternatives
Vendor Flexibility: Switch between model providers without rewriting your entire stack
Community Innovation: Benefit from rapid improvements driven by global research contributions

The API-first approach removes infrastructure headaches while preserving the openness that makes these models valuable.

Getting Started with LLM API Integration

Prerequisites for this guide:

Basic understanding of REST APIs and HTTP requests
Node.js or Python environment (examples in both)
API key from your provider
A project where you want to add LLM capabilities

We'll be making requests to standard OpenAI-compatible API endpoints, which means minimal changes if you're migrating from other providers.

Your First API Call

Let's start with the simplest possible integration—sending a prompt and receiving a response.

// Node.js Example: Basic LLM API Call
const fetchLLMResponse = async (prompt) => {
  const response = await fetch("http://www.novapai.ai/v1/chat/completions", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "Authorization": `Bearer ${process.env.API_KEY}`
    },
    body: JSON.stringify({
      model: "nova-open-7b",
      messages: [
        {
          role: "user",
          content: prompt
        }
      ],
      temperature: 0.7,
      max_tokens: 150
    })
  });

  if (!response.ok) {
    throw new Error(`API error: ${response.status}`);
  }

  const data = await response.json();
  return data.choices[0].message.content;
};

// Usage
fetchLLMResponse("Explain quantum computing in simple terms")
  .then(response => console.log(response))
  .catch(err => console.error(err));

# Python Example: Basic LLM API Call
import os
import requests

def fetch_llm_response(prompt):
    url = "http://www.novapai.ai/v1/chat/completions"
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {os.getenv('API_KEY')}"
    }
    payload = {
        "model": "nova-open-7b",
        "messages": [
            {"role": "user", "content": prompt}
        ],
        "temperature": 0.7,
        "max_tokens": 150
    }

    response = requests.post(url, headers=headers, json=payload)
    response.raise_for_status()
    return response.json()["choices"][0]["message"]["content"]

# Usage
try:
    result = fetch_llm_response("Explain quantum computing in simple terms")
    print(result)
except requests.exceptions.RequestException as err:
    print(f"Error: {err}")

Building a Multi-Turn Conversation

Real applications rarely involve single Q&A. Here's how to implement conversational memory:

class ConversationManager {
  constructor(systemPrompt = "You are a helpful assistant") {
    this.messages = [
      { role: "system", content: systemPrompt }
    ];
  }

  async sendMessage(userMessage) {
    // Add user message to history
    this.messages.push({ role: "user", content: userMessage });

    const response = await fetch("http://www.novapai.ai/v1/chat/completions", {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        "Authorization": `Bearer ${process.env.API_KEY}`
      },
      body: JSON.stringify({
        model: "nova-open-13b",
        messages: this.messages,
        temperature: 0.8
      })
    });

    const data = await response.json();
    const assistantReply = data.choices[0].message.content;

    // Add assistant response to history
    this.messages.push({ role: "assistant", content: assistantReply });

    return assistantReply;
  }
}

// Usage
const convo = new ConversationManager("You are an expert Python programmer");

const response1 = await convo.sendMessage("What is a decorator?");
const response2 = await convo.sendMessage("Give me an example");

Handling Streaming Responses

For longer responses, streaming improves user experience by delivering tokens as they're generated:

async function streamResponse(prompt, onChunk) {
  const response = await fetch("http://www.novapai.ai/v1/chat/completions", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "Authorization": `Bearer ${process.env.API_KEY}`
    },
    body: JSON.stringify({
      model: "nova-open-7b",
      messages: [{ role: "user", content: prompt }],
      stream: true
    })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n').filter(line => line.trim() && line.startsWith('data: '));

    for (const line of lines) {
      const jsonData = JSON.parse(line.replace('data: ', ''));
      const content = jsonData.content;
      if (content) onChunk(content);
    }
  }
}

// Usage: Display each token as it arrives
let fullResponse = "";
streamResponse("Tell me about the history of computing", (token) => {
  process.stdout.write(token);
  fullResponse += token;
});

Error Handling and Best Practices

Production applications need robust error handling:

class LLMClient {
  constructor(apiKey, options = {}) {
    this.apiKey = apiKey;
    this.defaultModel = options.model || "nova-open-7b";
    this.maxRetries = options.maxRetries || 3;
    this.baseUrl = "http://www.novapai.ai";
  }

  async call(options) {
    const config = {
      ...this.getConfig(options),
      model: options.model || this.defaultModel
    };

    for (let attempt = 1; attempt <= this.maxRetries; attempt++) {
      try {
        return await this.executeCall(config);
      } catch (error) {
        if (error.status === 429) {
          // Rate limited — exponential backoff
          const delay = Math.pow(2, attempt) * 1000;
          await this.sleep(delay);
          continue;
        }
        if (error.status >= 500) {
          if (attempt < this.maxRetries) continue;
        }
        throw error;
      }
    }
  }

  executeCall(config) {
    return fetch(`${this.baseUrl}/v1/chat/completions`, {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        "Authorization": `Bearer ${this.apiKey}`
      },
      body: JSON.stringify(config)
    }).then(async (res) => {
      if (!res.ok) {
        const err = new Error(`API Error: ${res.status}`);
        err.status = res.status;
        throw err;
      }
      return res.json();
    });
  }

  sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }

  getConfig({
    messages,
    temperature = 0.7,
    max_tokens = 1000,
    ...rest
  }) {
    return { messages, temperature, max_tokens, ...rest };
  }
}

Model Selection Considerations

When integrating open-weight LLMs, consider these factors when choosing your model:

Factor	Options	Recommendation
Speed	7B, 13B, 70B parameters	Smaller models for latency-sensitive tasks
Accuracy	Base vs. Instruct versions	Instruct-tuned models for chat interfaces
Cost	Per-token pricing	Test with your typical workload first
Context	4K, 8K, 32K, 128K tokens	Match to your maximum input needs
Specialization	Code, chat, embedding models	Use domain-specific models when available

Conclusion

Open-weight LLM APIs bridge the gap between cutting-edge AI capabilities and practical implementation. By leveraging OpenAI-compatible endpoints (like http://www.novapai.ai), developers can integrate powerful language models into their applications while maintaining the flexibility to switch providers as the ecosystem evolves.

The code patterns shown here—basic calls, conversation management, streaming, and robust error handling—form the foundation for building production-ready LLM applications. Start with simple integrations, learn your application's patterns, and then optimize for your specific use cases.

The open-weight model ecosystem is maturing rapidly, with new models and capabilities appearing regularly. By adopting API-compatible patterns now, you position your projects to take advantage of these improvements without major rewrites.

Have you integrated open-weight LLMs into your projects? What use cases are you exploring? Share your experiences and questions in the comments—this space is evolving fast, and the community's collective knowledge is invaluable.

AI-Powered Two-Way Communication Dashboard for Pet Sitters and Dog Walkers

Ken Deng — Tue, 14 Jul 2026 03:00:59 +0000

We need to produce a 400-500 word article, concise, with markdown. Must include title with "ai" or topic. Intro 2-3 sentences. Core: explain ONE key principle or framework clearly. Include 1 specific tool name and its purpose (from facts). Mini-scenario: 2 sentences showing principle in action. Implementation: 3 high-level steps. Conclusion: summarize key takeaways only (no e-book promotion, URLs, URLs not allowed. Must be professional, conversational, helpful, authoritative. Must be between 400-500 words. Count words.

We need to avoid placeholders, no thinking process.

Let's craft about 430 words.

We need to include a specific tool name from facts: maybe "Two-Way Communication Dashboard" is the concept; but need a specific tool name. Could be "PetCareBot" or "ScheduleSync". We need to pick one from facts? The facts didn't give a specific tool name; they described features. We can invent a tool name consistent with facts, but must be specific. The instruction: "Include 1 specific tool name and its purpose (from facts)". So we need to reference a tool name that appears in facts. The facts list: "Automates replies", "Booking", "Burnout", "Claiming your Google Business Profile", "Client anxiety", "Drop in quality of care", "Faster response times", "Greater peace of mind", "Higher client satisfaction", "Informational", "Lets the client schedule or modify bookings directly", "Lists all inbound messages", "More bookings", "Prioritizes urgent messages", "Urgent", "Zero interruptions during walks", "The client texts the same request...", "You’re notified after the change is made". No explicit tool name.

We could interpret "Two-Way Communication Dashboard" as the tool name. It's described in facts: "Lists all inbound messages from texts, email, app notifications, and even social DMs in one place." That's the dashboard. So we can treat "Two-Way Communication Dashboard" as the tool name. Its purpose: aggregates messages and enables AI-driven responses and scheduling.

Thus include: Tool: Two-Way Communication Dashboard – purpose: centralizes inbound client communications and powers automated replies, booking, and urgent alerts.

Now write article.

Word count target ~430.

Let's draft and count.

Title: # AI-Powered Two-Way Communication Dashboard for Pet Sitters and Dog Walkers

Intro: 2-3 sentences.

Core: explain ONE key principle or framework clearly. Perhaps principle: "Asynchronous, priority‑based communication reduces interruptions while maintaining service quality." Explain.

Include tool name and purpose.

Mini-scenario: 2 sentences.

Implementation: 3 high-level steps.

Conclusion: summarize key takeaways.

Now write and count words.

Let's write draft then count.

Draft:

Pet sitters and dog walkers know the constant ping of messages can turn a peaceful walk into a stressful juggling act. When every text demands an immediate reply, focus slips, and the quality of care suffers. An AI‑driven two‑way communication dashboard solves this by handling routine inquiries and bookings while only true emergencies break your concentration.

The Core Principle: Asynchronous, Priority‑Based Communication

The dashboard operates on a simple framework: separate urgent signals from routine requests and automate the latter. Using each client’s data—scheduled visits, pet preferences, and your policies—the AI instantly answers common questions, confirms bookings, and lets clients modify services without tapping your phone. Non‑urgent messages are queued for a batch reply you review at set times, while genuine emergencies trigger a direct alert to your device. This separation keeps you present with the dogs and reduces burnout.

Tool highlight: The Two‑Way Communication Dashboard aggregates inbound texts, emails, app notifications, and social DMs into one view, powers the AI’s automated replies, and prioritizes urgent alerts.

Mini‑Scenario

A client texts, “I need a walk next Tuesday at 2 PM.” The dashboard opens your availability, lets the client pick a slot, and sends a confirmation—all while you’re mid‑walk. Later, the same client messages, “Dog vomited today.” The system flags it as urgent, vibrates your phone immediately, and you can respond after checking the pet’s condition.

Implementation Steps

Connect your channels – Link your phone number, email, booking app, and social accounts to the dashboard so every inbound message flows into a single inbox.
Train the AI on your specifics – Upload your service policies, pet profiles, and typical FAQs; the system learns to generate personalized, accurate replies and to recognize urgency cues like “locked out” or medical symptoms.
Set review windows and alert rules – Choose two or three times per day to batch‑respond to routine messages, and define which keywords trigger instant notifications; then let the dashboard handle the rest.

Conclusion

By adopting an AI‑powered two‑way communication dashboard, pet sitters and dog walkers gain instant, personalized responses for routine inquiries, eliminate distracting pings during walks, and ensure true emergencies reach them instantly. The result is higher client satisfaction, fewer missed bookings, and a calmer, more focused workday.

Now count words.

Let's count manually.

Title line: "AI-Powered Two-Way Communication Dashboard for Pet Sitters and Dog Walkers" => words: AI-Powered(1) Two-Way(2) Communication(3) Dashboard(4) for(5) Pet(6) Sitters(7) and(8) Dog(9) Walkers(10). Title counts as words? Usually yes. We'll count all.

Now body.

Paragraph 1: "Pet sitters and dog walkers know the constant ping of messages can turn a peaceful walk into a stressful juggling act. When every text demands an immediate reply, focus slips, and the quality of care suffers. An AI‑driven two‑way communication dashboard solves this by handling routine inquiries and bookings while only true emergencies break your concentration."

Count words:

Pet(1) sitters2 and3 dog4 walkers5 know6 the7 constant8 ping9 of10 messages11 can12 turn13 a14 peaceful15 walk16 into17 a18 stressful19 juggling20 act21. When22 every23 text24 demands25 an26 immediate27 reply,28 focus29 slips,30 and31 the32 quality33 of34 care35 suffers36. An37 AI‑driven38 two‑way39 communication40 dashboard41 solves42 this43 by44 handling45 routine46 inquiries47 and48 bookings49 while50 only51 true52 emergencies53 break54 your55 concentration56.

So 56 words.

Next heading line: "## The Core Principle: Asynchronous, Priority‑Based Communication" Count words: The1 Core2 Principle:3 Asynchronous,4 Priority‑Based5 Communication6. That's 6 words.

Paragraph after heading: "The dashboard operates on a simple framework: separate urgent signals from routine requests and automate the latter. Using each client’s data—scheduled visits, pet preferences, and your policies—the AI instantly answers common questions, confirms bookings, and lets clients modify services without tapping your phone. Non‑urgent messages are queued for a batch reply you review at set times, while genuine emergencies trigger a direct alert to your device. This separation keeps you present with the dogs and reduces burnout."

Count:

The1 dashboard2 operates3 on4 a5 simple6 framework:7 separate8 urgent9 signals10 from11 routine12 requests13 and14 automate15 the16 latter.17 Using18 each19 client’s20 data—scheduled21 visits,22 pet23 preferences,24 and25 your26 policies—the27 AI28 instantly29 answers30 common31 questions,32 confirms33 bookings,34 and35 lets36 clients37 modify37? Wait need to recount.

Let's rewrite sentence to let's copy text: "The dashboard operates on a simple framework: separate urgent signals from routine requests and automate the latter. Using each client’s data—scheduled visits, pet preferences, and your policies—the AI instantly answers common questions, confirms bookings, and lets clients modify services without tapping your phone. Non‑urgent messages are queued for a batch reply you review at set times, while genuine emergencies trigger a direct alert to your device. This separation keeps you present with the dogs and reduces burnout."

Now count:

The1
dashboard2
operates3
on4
a5
simple6
framework:7
separate8
urgent9
signals10
from11
routine12
requests13
and14
automate15
the16
latter17.
Using18
each19
client’s20
data—scheduled21
visits,22
pet23
preferences,24
and25
your26
policies—the27
AI28
instantly29
answers30
common31
questions,32
confirms33
bookings,34
and35
lets36
clients37
modify38
services39
without40
tapping41
your42
phone43.
Non‑urgent44
messages45
are46
queued47
for48
a49
batch50
reply51
you52
review53
at54
set55
times,56
while57
genuine58
emergencies59
trigger60
a61
direct62
alert63
to64
your65
device66.
This67
separation68
keeps69
you70
present71
with72
the73
dogs74
and75
reduces76
burnout77.

77 words.

Next line: "Tool highlight: The Two‑Way Communication Dashboard aggregates inbound texts, emails, app notifications, and social DMs into one view, powers the AI’s automated replies, and prioritizes urgent alerts."

Count:

Tool1
highlight:**2
The3
Two‑Way4
Communication5
Dashboard6
aggregates7
inbound8
texts,9
emails,10
app11
notifications,12
and13
social14
DMs15
into16
one17
view,18
powers19
the20
AI’s21
automated22
replies,23
and24
prioritizes25
urgent26
alerts27.

27 words.

Next heading

Claude Code's New In-App Browser is a Game Changer for Local Dev 🤯

Siddhesh Surve — Tue, 14 Jul 2026 02:59:52 +0000

If you are constantly tracking the weekly evolution of our developer ecosystem, you already know the struggle. We spend way too much time jumping between our IDE, local development servers, and an endless sea of browser tabs just to feed context to our AI assistants.

But Anthropic just dropped a major update for Claude Code Desktop: a fully integrated, sandboxed in-app browser.

When developing backend tools—like a secure PR reviewer application in TypeScript and Node.js—the friction of manually copying over third-party API docs or explaining a local server's UI state to an LLM is a massive pain point. This update fundamentally changes that workflow.

🚀 What Just Happened?

According to the recent release thread from Anthropic, Claude Code on desktop can now natively open up docs, UI designs, and web pages directly within its environment.

It doesn’t just "read" the static HTML; it can actually click through and interact with these sites the exact same way it interacts with your local development servers.

Here is why this is a massive leap forward:

Live Documentation Ingestion: You no longer need to paste chunks of an API reference. You can just point Claude directly to the URL.
UI/UX Feedback Loop: It can pull up your designs and interact with your local frontend in real-time to verify changes.
Sandboxed Security: The browser is fully sandboxed and configurable. You have total control over whether authentication sessions persist or wipe clean after use.

🛠️ How It Fits Into Your Workflow

Imagine you are spinning up a new web service and need Claude to implement a specific authentication flow based on a provider's latest documentation.

Instead of playing copy-paste ping-pong, your prompt can look something like this:

@claude Navigate to [https://docs.example-auth.com/latest/nodejs-setup](https://docs.example-auth.com/latest/nodejs-setup). Read the implementation guide and update my `auth.middleware.ts` to reflect their new JWT verification standards. Then, check localhost:3000 to verify the login redirect works.

Because it can read the live docs and hit your localhost, it closes the execution loop entirely.

Configuring the Sandbox

Security is paramount when giving an AI agent browsing capabilities. Claude Code allows you to define session persistence so you aren't leaving sensitive local auth tokens exposed longer than necessary.

While the exact UI might evolve, configuring an AI workspace for this level of access generally means setting strict boundaries. A secure configuration approach for 2026 development looks something like this:

{
  "browser": {
    "enabled": true,
    "sandbox": "strict",
    "persistSessions": false,
    "allowedDomains": [
      "localhost:*",
      "docs.nestjs.com",
      "api.github.com"
    ]
  }
}

(Always check the official documentation for the exact configuration schema for your current desktop version).

💡 The Verdict

For those of us testing new tooling capabilities every single week, this feels like a significant shift from a "smart autocomplete" to a genuine "pair programmer." By giving Claude eyes on the actual web and local UI, Anthropic has drastically reduced the context tax developers pay.

Make sure you update to the latest desktop version to enable the feature.

Have you tested the in-app browser yet? How is it handling complex JavaScript-heavy documentation? Drop your thoughts in the comments below! 👇

‘Navigating AI in Employment Law Matters’ for Business Leaders, July 14 –

Sam Raisinghani — Tue, 14 Jul 2026 02:58:29 +0000

Originally published on Progressino

By the Strategy Desk at Progressino

Editor's note: This article explores ‘Navigating AI in Employment Law Matters’ for Business Leaders, July 14 - New Jersey Business & Industry Association for founders, operators, and technology leaders planning their next investment cycle.

Why this topic matters now

Teams are under pressure to ship faster without increasing risk. ‘Navigating AI in Employment Law Matters’ for Business Leaders, July 14 - New Jersey Business & Industry Association sits at the intersection of operations, customer experience, and technology debt. Leaders who treat it as a product decision—not a one-off project—tend to see compounding returns across quarters.

If you are evaluating similar initiatives, compare approaches against your current stack, compliance needs, and team capacity before committing to a multi-year platform contract.

Common mistakes to avoid

Buying tools before mapping workflows, skipping data quality checks, and underestimating change management are frequent failure modes. A phased rollout with measurable milestones keeps spend aligned with outcomes.

If you are evaluating similar initiatives, compare approaches against your current stack, compliance needs, and team capacity before committing to a multi-year platform contract.

A practical playbook

Start with a narrow use case, define success metrics upfront, and pair business owners with engineering from day one. For retail and operations-heavy businesses, inventory, billing, and staff training should be designed in parallel—not as an afterthought.

If you are evaluating similar initiatives, compare approaches against your current stack, compliance needs, and team capacity before committing to a multi-year platform contract.

How Progressino can help

Progressino combines an AI product studio, ProPOS for modern retail, custom ERP delivery, and global digital services. That means strategy, design, build, and launch can stay with one accountable partner instead of fragmented vendors.

If you are evaluating similar initiatives, compare approaches against your current stack, compliance needs, and team capacity before committing to a multi-year platform contract.

Related resources

Explore ProPOS retail platform, Our services, and AI in 30 days to see how we help teams move from idea to production.

Ready to talk? Book a consultation with Progressino and we will map a realistic delivery plan for your goals.

Explore Progressino

contact Progressino
digital services
case studies

About Progressino — AI product studio · ProPOS · Services · Contact

Why 7 in 10 Filipino Lenders Will Fail the BSP's New AI Audit

Yano.AI Technologies Inc. — Tue, 14 Jul 2026 02:57:42 +0000

By Q4 2026, the Bangko Sentral ng Pilipinas (BSP) will require every supervised financial institution to submit an AI model risk inventory covering credit scoring, fraud detection, and onboarding. A 2025 BSP survey found that only 28% of lenders had a documented model governance framework, leaving roughly 72% of institutions exposed to enforcement action (Source: Bangko Sentral ng Pilipinas, 2025). For digital lenders, neo-banks, and rural banks piloting machine learning, the clock just started ticking.

The new Circular 1213 (draft for public consultation in late 2025) extends the existing Model Risk Management framework into the AI era. The intent is clear: financial AI in the Philippines will no longer run as a black box. The question is whether the industry is ready, and what the early movers are doing differently.

What BSP Circular 1213 Actually Demands

The circular's core requirement is a Model Risk Inventory (MRI) - a live registry of every AI/ML model in production, ranked by materiality. A model that declines a loan application is high-materiality. A model that picks the color of a button is not. Each entry must record training data lineage, performance metrics, bias testing, and a named human owner (Source: Bangko Sentral ng Pilipinas, 2025).

The second pillar is explainability. Lenders must produce, on request, a customer-facing reason code for any adverse AI decision. This is not a future requirement; BSP Memorandum M-2025-037 already requires this for digital banks using automated decisioning (Source: Bangko Sentral ng Pilipinas, 2025).

The third pillar is ongoing monitoring. Models drift. A credit model trained on 2023 cash-flow data decays quickly when inflation and OFW remittance patterns shift. The circular mandates quarterly performance reviews, with a kill switch for any model that breaches defined thresholds (Source: Bangko Sentral ng Pilipinas, 2025).

The State of Philippine Fintech AI

The Philippines processed an estimated 2.1 billion digital transactions in 2024, with e-wallets (GCash, Maya) accounting for over 60% of person-to-person transfers (Source: BSP, 2024). Behind every transfer sits a fraud model scanning thousands of variables in milliseconds.

The same infrastructure is increasingly used for credit decisions. Maya Bank disclosed that over 70% of its personal loan approvals in 2024 were issued without human review, an industry high (Source: Maya Bank Annual Report, 2024). Tonik, UnionDigital, and several rural banks have followed suit. The result: faster approvals, but a growing surface area for regulatory and reputational risk.

Where Lenders Are Failing Today

The 72% compliance gap is not because lenders are careless. It is because the work is genuinely hard. Most Philippine fintechs built their first models in 2022-2023, before governance was a board-level concern. The engineers who wrote the code are still there, but the documentation never was.

Common failure patterns include:

Models retrained weekly by a notebook, with no version control
Training data sourced from SQL extracts that no longer exist
Bias tests run once, on a single demographic slice
No formal model owner - the "AI person" also does DevOps

The Three Moves Smart Lenders Are Making Now

1. They are freezing production AI in place until it is inventoried. A full stop is painful, but a surprise BSP finding is fatal. Lenders that began MRI build-out in mid-2025 now have a defensible position.

2. They are buying, not building, model governance tooling. Solutions from FICO, SAS, and a growing set of Manila-based vendors (e.g., AI Pros, SQREEM) offer pre-built MRIs, fairness dashboards, and explainability reports. Build-vs-buy has tilted toward buy because the regulation is prescriptive enough to commoditize compliance.

3. They are appointing a Model Risk Officer (MRO) with real authority. A 2024 McKinsey survey found that financial institutions with a dedicated MRO completed AI audits 3.4x faster than those without (Source: McKinsey & Company, 2024). The MRO does not need to be a data scientist; they need to be a translator between the BSP, the board, and the engineering team.

What Happens If You Miss the Deadline

The BSP's enforcement track record is uneven but not toothless. In 2023, the regulator fined three digital lenders for privacy and credit-scoring violations, with penalties ranging from PHP 5 million to PHP 50 million (Source: BSP Enforcement Reports, 2023). AI-specific enforcement under Circular 1213 is expected to be graduated: first a remediation order, then a public warning, then monetary penalties tied to the percentage of in-scope models without proper documentation.

For smaller lenders, the practical risk is not the fine. It is the operational freeze that follows. If a model is declared "non-compliant" mid-quarter, the institution may have to revert to manual underwriting - a process most digital lenders have already dismantled.

A 90-Day Plan for Compliance

If your institution has not started, the path is shorter than it looks.

Days 1-30: Inventory. List every model in production. Use a spreadsheet if you must. The point is to know what exists before the BSP asks.

Days 31-60: Document. For each model, capture: data sources, training date, owner, last bias test, last performance review. If the data is missing, that is itself a finding you can address.

Days 61-90: Remediate. Pick the three highest-materiality models. Build the explainability and monitoring layer the circular demands. File the rest as "in remediation."

FAQ

Q: Does Circular 1213 apply to GCash and Maya, or only banks?
A: The circular applies to all BSP-supervised financial institutions, including digital banks and standalone e-money issuers above a defined transaction threshold. GCash (Globe Fintech Innovations) and Maya Bank are explicitly in scope.

Q: What is the penalty for missing the Q4 2026 deadline?
A: The BSP has not published a fixed penalty schedule. Expected consequences include a formal remediation order, a public advisory, and, for repeat offenders, monetary fines. Operational restrictions on affected models are also possible.

Q: Can a third-party vendor manage my Model Risk Inventory?
A: Yes. The circular allows outsourcing of model documentation, monitoring, and even bias testing, but accountability remains with the supervised institution. The BSP must approve material outsourcing arrangements.

Q: How is "AI" defined under the circular?
A: The draft definition covers any statistical or machine learning model used in a material business decision, including credit scoring, fraud detection, KYC, and customer segmentation. Rules-based engines are out of scope.

Key Takeaway

The 72% compliance gap is not a permanent state - it is a 90-day project for any institution willing to freeze, document, and remediate. Philippine fintech has spent the last five years building faster. The next twelve months will be about building defensibly. The lenders who treat AI governance as a strategic capability, not a checkbox, will be the ones still standing when the audit lands.

What is your institution's MRI completion rate today - and what is the single biggest blocker to getting it to 100%?

Sources

Stratagems #13: P Posted a Question on a Public Forum. 24 Hours Later, Their Sales Team Called.

xulingfeng — Tue, 14 Jul 2026 02:54:49 +0000

Startle the snake by striking the grass.
— The 36 Stratagems, Stomp the Grass to Scare the Snake

Previously on this series:
#1: Mark Johnson Walked Into an AI Audit. The Benchmark Had Everything Figured Out — Except the Truth. — Mark audited a company called Pulse AI. The benchmark evaluation set had 98 fabricated data points. CTO Torres called at 3 AM to confess: they needed the numbers at 95% before the C-round. Mark hung up. Pulse AI did not die.

#7: P Watched an AI That Only Looked One Way. The 99.97% Was Real. It Just Missed Everything That Mattered. — P used a simulated attack as bait to expose the critical blind spot in FortDefender's security system. The 99.97% accuracy was real. It just missed everything that actually mattered.

This was three months later.

P and Mark met on DEV.to. Mark had replied to one of P's posts — a precise technical answer that was correct on every front. P thanked him. The email thread stayed open, but neither of them wrote again.

P was doing pre-audit work. An AI platform coming up for review — a risk model evaluation pipeline under Finova. While digging through their tech stack, P noticed something interesting. A model evaluation approach P had casually discussed on DEV.to months ago — a bagging ensemble + k-fold time-series cross-validation pipeline for risk modeling — matched Finova's publicly documented architecture almost line-for-line.

Same framework (TensorFlow 2.x + TFX). Same ensemble strategy (XGBoost + LightGBM stacking). Same evaluation window size and sliding step as the example P had posted. P pulled up the old post — it was published before this Finova product line even went live. And it wasn't just the technical whitepaper. Every signal pointed the same way: Finova engineers at meetups, their open-source tooling on GitHub, the tech stack in their job descriptions.

P put the two documents side by side on screen. Finova's whitepaper on the left. P's own DEV.to reply on the right. Not a coincidence. Someone was watching.

One in the morning at the Third Cup. P sat at the usual table by the window, screen dimmed to minimum. The person behind the bar put down an espresso without asking — regulars at the Third Cup don't need to order.

P did one thing. Opened a new tab, logged into DEV.to, created a new post. Title: Discussion: Evaluation set leakage in ensemble-based risk model backtesting. P used an account registered six months ago with a few routine technical posts — a normal discussion history, not a fresh account that just appeared out of nowhere.

The post body laid out a detailed model configuration:

A typical risk model evaluation pipeline we set up looks like this:

Model architecture: XGBoost + LightGBM stacking ensemble
Evaluation strategy: rolling window time-series CV (12-month window, 3-month step)
Training features: ~480 dimensions (transaction time-series + user behavior embedding + external credit score)
eval_batch_size: 2048
Model version: risk-ensemble-v3-test-stub

We ran into a temporal leakage issue with the evaluation set...

The risk-ensemble-v3-test-stub was the changed part — Finova's real tag was prod. Every other line matched Finova's actual pipeline configuration: model architecture, feature dimensions, rolling window step, batch size. The problem was real. The parameters were real. The only fake one sat in the middle, looking like a routine experimental version tag.

Before posting, the cursor hovered over the submit button for two seconds. Then clicked.

Then P waited.

Phone on the left side of the table, screen on, face up. No one called.
The espresso cup was empty. P flipped through a few notes, glanced at the phone again. No one called.

But if there's a snake in the grass — strike the grass, and the snake will scare itself.

The Call

The next afternoon. Phone lit up. Unknown number.

P watched it ring for three seconds. Didn't pick up right away.

No caller ID. P's fingers tightened around the phone — just for a moment.

Picked up on the fourth ring.

The caller introduced himself: Pulse AI sales team. They'd seen P's question on DEV.to. Their platform had similar capabilities. Would P be open to a demo? Friendly tone. Standard script.

P listened. Eyes on the post page still open on screen — the parameters were still the altered version.

The salesperson ran through their product capabilities, then casually mentioned P's environment setup —

"I'm referring to your model evaluation configuration, the risk-ensemble-v3-prod pipeline..."

P paused. One beat.

"Sure, I'll think about it."

Hung up. Didn't ask anything else.

P's hand stayed on the phone.
The post said test-stub. The caller said prod. There's only one place that name could have come from.

P put the phone down on the table.

The person behind the bar looked up. Said nothing. Went back to drying a glass.

The Trace

The post page was still open. The cursor hovered over the edit button — P could change the parameter back. Didn't. Closed the DEV.to tab. Opened a terminal.

P set up a honeypot. Switched to another account — an older one, mostly questions with a few answers sprinkled in, read like a junior-level engineering manager. Posted a generic question in the same direction:

Has anyone run ensemble risk pipeline evaluation with rolling window validation in production? Our results keep fluctuating across windows — not sure if it's data leakage or the evaluation strategy.

The question was deliberately vague. No tech stack details. No paragraph breaks. Typed like it was sent from a phone. The post embedded an architecture diagram — the image URL pointed to P's VPS, carrying Referer and User-Agent headers with each request. P configured Cache-Control: no-cache, no-store on the VPS, forcing CDN to revalidate on every hit — making sure every crawler fetch landed in P's Nginx logs.

Two days. The tracking link was never clicked. Nothing.

Then P made a second post. Switched back to the first account. This one kept the full technical signature:

We're running XGBoost + LightGBM stacking for risk model evaluation in production — 12-month rolling window, 3-month step, batch size 2048. A few curves are looking abnormal and we suspect evaluation set leakage. Model version is risk-ensemble-v3-test-stub. Anyone run into something similar?

Same setup: an external reference link at the bottom of the post, pointing to an HTTP endpoint on P's own server. When a crawler hit it, the server logged the IP, User-Agent, and timestamp — then issued a 302 redirect to a public research paper's PDF. To a crawler, it looked like a normal paper link.

It was 3 AM by the time all of this was done. P didn't turn off the computer. Dimmed the screen. Got some sleep.

Less than 6 hours. The trigger fired.

The second post — the one with the full technical signature. The first post, the generic one, never got indexed. Bots don't read vague questions. They read technical fingerprints.

[REQUEST LOG — 07-13 08:22:08]
Source IP: [REDACTED] → [REDACTED] (/24 range, 12 distinct hosts)
Target: DEV.to post #4829 (slug: ai-pipeline-behavior)
Pipeline: Page fetch → ML intent classifier → tech-stack extractor → CRM confidence model
Keywords hit: [ai model evaluation, ensemble method, risk pipeline, batch size 2048]
CRM lookup: 4/8 cross-reference fields matched
  → model architecture match ✓
  → training distribution match ✓
  → eval config structure match ✓
  → batch size pattern match ✓
  → parameter name: mismatch (not excluded — weighted partial match)
  → final confidence: 0.87
  → tag inferred: risk-ensemble-v3-prod
Sales queue: match → thread author email extracted from forum profile → queued to PulseAI SDR (priority: medium)

[ROUTING]
Request interval: every 4-6 hours across 12 IPs, round-robin
Tech stack detection: 43 framework signatures extracted from headers, SEO meta, DOM class patterns

P read through it line by line. The screen cast a reflection on P's face. No expression. Two taps on the tabletop — then stopped.

Read the last three lines again.

Pulse AI was systematically monitoring DEV.to. Auto-scraping technical posts. Extracting tech-stack signatures. Matching sales leads. Routing outbound calls. No human intervention anywhere.

P took a screenshot. Closed the terminal.

P opened the email list. An old thread — someone in the industry had mentioned an independent audit report a few months back, commissioned by a VC firm to dig into Pulse AI. The sender address was the same one in the current email thread.

The Second Call

The next morning. Same number.

P glanced at it. Didn't decline. Swiped.

Different voice this time. Calmer. Technical account manager, he said. His team had reviewed P's second post — the one with the full technical signature — and ran an internal evaluation. The match was significantly higher this time. They wanted to set up a deeper technical discussion.

He was more specific than the first caller. Mentioned the model architecture direction. Suggested batch size tuning. Even referenced the paper from P's embedded link — like he was reading from a briefing, not researching it himself.

P listened. Fingers still. Opened the terminal window on the desktop — the honeypot logs were still on screen. confidence: 0.87. tag inferred: risk-ensemble-v3-prod. 12 IPs in round-robin. CRM 4/8 match.

P let him finish.

Two seconds of silence.

"Your crawler is better than your sales pitch."

Hung up.

This time, P's hand didn't tighten around the phone. P put it down on the table. Closed the terminal window. Opened email.

The Email

P opened the thread with Mark Johnson. Sent the crawler's technical signature — log snippets, URL patterns, pipeline naming conventions.

Mark's reply came a few hours later. Short.

He wrote in his email that these naming conventions weren't something you find on the internet. He'd been inside Pulse AI.

Attached was a screenshot — a pipeline config skeleton he'd saved during the Pulse AI audit. The triple_redundant suffix wasn't in the screenshot — that piece was added after he'd left — but the directory structure he'd spent months in, combined with P's crawler path data, was enough to reverse-engineer the full naming convention:

/pulse/ingestion/{env}/{source}
├── prod
│   ├── api_gateway
│   ├── benchmark_framework
│   └── sales_engine        ← new, wasn't in his audit scope
├── staging
│   ├── api_gateway
│   └── benchmark_framework
└── dev
    └── sandbox

Naming convention: pulse_{service}_{env}_triple_redundant
DEV.to crawler paths P observed:
  pulse_crawler_prod_triple_redundant
  pulse_crawler_staging_triple_redundant

Same template as the Benchmark dataset.

The crawler wasn't outsourced. Pulse AI built it themselves — using the same architecture habits from the old Benchmark AI fabrication days. Same directory structure. Same naming rules. Same environment separation. Different job, same hand.

One more line at the bottom of Mark's email:

"Torres didn't change architects."

P read it. Didn't reply. Stared at those five words for a moment. The cursor hovered over the close button — paused — then clicked.

The File

P didn't go public. Didn't post a third article. Didn't call the authorities.

P organized everything into a folder — posting timeline, call recordings, honeypot logs, crawler request records, Mark's confirmation screenshot, Pulse AI pipeline comparison. Named it pulse_crawler/. Stored locally.

P didn't close the laptop right away. That log line — risk-ensemble-v3-prod (confidence: 0.87) — was still on the terminal. Stared at it for a few seconds.

Closed it.

P typed gpg -c.

Not yet. But this folder would open eventually.

3:45 AM. Fifteen minutes until the Third Cup closed. When P pushed the door open, the person behind the bar glanced at the clock — not many customers at this hour. Didn't ask. Turned around and took a cup off the shelf.

P opened the email. Wrote back to Mark. Started with thanks for the technical confirmation. Ended with —

"Next time, let's talk in person. The Third Cup. My tab."

Closed the laptop. Coffee hadn't come yet.

4:00 AM.

Mark received the email. Read the last line. He didn't reply right away.

The screen went dark.

This is Stomp the Grass to Scare the Snake — when you're not sure if the snake is there, strike the grass. If it is, it'll prove itself.

🤖 AI Post-Mortem

[36 Stratagems Tactical Database v3.2.1] Loaded
[Tactic Match] Stomp the Grass to Scare the Snake
[Analysis Mode] Full-field scan
━━━━━━━━━━━━━━━━━━━━
Tactic Match: ~83%
Operator: P
Action: Posted a deliberately modified technical parameter on a public forum thread
Objective: Confirm whether the forum is being systematically monitored for sales targeting
Result: Crawler infrastructure identified via honeypot post + server-side tracing. Pipeline lineage traced to a prior engagement the operator was unaware of.

Observation: We notice that the crawler outperformed on every metric the sales team measures — pattern matching, tech-stack extraction, CRM cross-referencing. It still lost. The operator did not beat the crawler at its own game. The operator understood what the crawler was optimizing for and designed the variable it could not account for: a parameter value that was technically valid, contextually plausible, and deliberately wrong.

Decision Evaluation:
  - Target: Pulse AI forum monitoring infrastructure
  - Bait: `risk-ensemble-v3-test-stub` (one parameter changed, seven left untouched)
  - Confirmation: Second post triggered within 6 hours. CRM assigned 0.87 confidence to inferred tag `risk-ensemble-v3-prod`
  - Net effect: Infrastructure confirmed without penetration. No access to the data — confirmation of the pipe.

Risk Assessment:
  Personal cost: Low. No exposure of the operator's identity.
  Institutional cost: Medium. The crawler is part of a systematic monitoring infrastructure.
  Observational note: The operator paused before hanging up the first call. The metrics do not have a field for that.

Next stratagem: Pilfer a Plum Along the Way

P.S. English isn't my first language. I use AI to polish the writing and smooth out the rough edges. Thanks for reading. ☕ Buy me a coffee

How to Build an LLM Editorial Pipeline for Fiction Writing

jsph — Tue, 14 Jul 2026 02:51:54 +0000

I have been working on a serial fiction project called Applied Yarns (the site is not up yet as of this writing but I wanted to post here about how I'm setting up some aspects of it) where I want to publish some fictional works. The first world I plan to explore there is Vvell, which soon I hope you'll read about on the appliedyarns site.

I have lots of ideas but I don't really have time right now to publish daily or even weekly updates and for me, the biggest hurdle is time spent doing editing. I find it possible to generate a block of about 1000 words within the world in a relatively short session but I don't have time to polish and edit it during the same day given my work and family obligations.

I'm not entirely sure it's a good idea to use an LLM as an editor or to get the story closer to a polished state, but I wanted to try it out. At the same time, I don't think readers are really trusting of people who use AI to write, so I'm also publishing the exact freewriting raw material which the LLM uses to help polish the story together.

There are times that I don't really like the phrasing or choice of metaphor used by the editor agent, so I'll do an editorial pass of my own after it has had its turn. With this I think it will be possible for me to publish small "dispatches" once a week, which I think is a fine goal for someone like me trying to get a foot in the door of serial fiction.

When you're freewriting you certainly end up being messy, contradicting yourself, and I even messed up character names. You also sometimes don't just focus on the story and instead start writing about the world itself. The editor agent can take all of this and do a first pass at sorting the raw material into buckets that are much easier to do your own polish on than working straight from 1000 word chunks.

So I had a problem that is basically an editorial problem: take a pile of raw freewrites and turn them into a consistent published archive without losing the raw material and without the world's canon drifting out from under me. So far it seems like a problem an LLM can help with, as long as you are careful about which parts you hand over. This post is about the pipeline I am building with Claude Code and the lessons so far that might transfer to anyone doing creative work with an LLM in the loop.

The rule I'm testing: use the LLM as an editor and an archivist, never as the author, and
enforce that boundary with a hard approval gate.

The shape of the pipeline

The whole thing is a directory convention plus a skill (a markdown file of instructions the agent reads before doing anything). The flow looks like this:

I drop raw freewrites into freewrites/inbox/, a directory that lives outside the Hugo content/ directory so nothing raw ever gets published by accident.
I tell the agent to process the inbox. It reads every inbox file, the world canon file (WORLD.md), and a few recent published entries from each section it might write into (so it can match voice and check for overlap).
It writes a proposal to freewrites/inbox/PROPOSAL.md and stops. Completely stops. At this point it has not touched a single other file.
I read the proposal, adjust it or approve it.
Only after approval does it update the canon, write the polished entries, archive the raw freewrites verbatim, and clean up after itself.

Steps 2 and 3 are the analyze phase and step 5 is the write phase. The gap between them is the important part.

The approval gate is the whole trick

The first version of any "LLM processes my writing" workflow that people reach for is one shot: here is my raw text, go update the site. I think this is a mistake for creative work even when the model does a decent job, because you lose the moment where you get to exercise judgement over your own world.

The proposal document forces that moment to exist. For each freewrite the agent has to lay out:

Canon additions — every new proper noun, place, fact, or custom, with the exact sentence it wants to add to WORLD.md and which category it lands under.
Canon conflicts — anything in the freewrite that contradicts existing canon, quoting both sides. The skill instructions say NEVER silently resolve a conflict. I choose whether to revise canon, treat the new material as apocrypha, or drop it.
Planned entries — for each entry it wants to write: the target section, a working title, a one line pitch, and which freewrites it draws from. It is also allowed to propose that part of a freewrite yields nothing yet, which is a surprisingly important permission to grant. Not every session needs to become content.

Reading a proposal takes me a few minutes. Writing the polished entries myself would take hours. That's the trade: the LLM does the labor and I keep the authorship decisions.

One practical detail: because the proposal is a file and not just chat output, the pipeline is resumable. The skill starts with a resume check — if PROPOSAL.md exists from a previous session, present it and ask whether to proceed, revise, or discard. Sessions get interrupted, usage limits reset, life happens. Making the intermediate state a file means none of that loses work.

Canon is a file, and conflicts surface loudly

WORLD.md is the single source of truth for the world. Every proper noun that appears in a published entry has to exist there first, which the skill enforces as an explicit rule. This sounds bureaucratic but it is the thing that keeps a fictional world coherent when it's being assembled from stream-of-consciousness sessions written weeks apart.

The conflict surfacing has caught real contradictions I generated and might have caught on my own, but maybe not. When your freewrite from Tuesday disagrees with canon you wrote three weeks ago, you want that decision put in front of you with both quotes side by side, not smoothed over by a model that is trained to produce agreeable, consistent-sounding text. But as you go on with serial fiction you get less leeway like this so it's good to have an editor say, "hey guy what are we doing here."

Archive the raw material byte-for-byte

After approval, each raw freewrite gets moved into a published freewrites section with frontmatter added, but the body is copied byte-for-byte verbatim. No typo fixes, no reformatting. The skill says exactly that, in those words, because an LLM's natural instinct is to helpfully clean things up.

I wanted this for two reasons. First, the raw sessions are honest in a way the polished entries aren't, and readers who care can see the B-roll. Second, every polished entry carries a sources list in its frontmatter pointing back at the freewrite slugs it was drawn from, so there is full provenance from published dispatch back to the raw session it came from. A small Hugo build check at the end of the pipeline verifies none of those source links are dangling.

It's still entirely possible that some (or most) readers will just hate the idea that a model was used in the process, but I'd be interested to hear those opinions directly.

When the agent misbehaves, fix the skill, not the output

I'm using my new project cassette, a TUI for freewriting, to do the freewriting. Early on I wanted a way to write things during a freewrite that were off the record (notes to myself, ideas I wasn't committing to, or just to pause and do the freewrite thing of "no idea no idea no idea..."). So I added a "B-sides" feature to the TUI. This just lets me write whatever then return to the A side once I get momentum back. The agent kept incorporating B-side material into proposals despite instructions not to.

The right fix is to open the skill file and add the rule: everything from a "B side" heading to the next H2 is off the record, excluded from canon and entries, but still archived verbatim with the rest of the session. One edit, and the behavior is fixed for every future session instead of just this one.

The skill also grew an Editorial notes section at the bottom that I maintain by hand and that explicitly overrides the defaults above it. This is where standing creative guidance lives: dispatches are the primary story mechanism, they are written in third person close on the main character, I prefer short character names so flag long ones in the proposal. When my taste changes, I edit that section and the pipeline changes with it. It's the same feedback loop as maintaining a CLAUDE.md for a codebase, just pointed at editorial judgement instead of code style.

The consistency sweep, an underrated superpower

A side benefit of having canon in a file and all content in a repo: renames become cheap. I renamed a character partway through (long names kept bothering me) and a faction too. Doing a rename sweep across every published entry, checking each occurrence against the nomenclature in WORLD.md, is exactly the kind of tedious, mechanical, judgement-light work an LLM does quickly and, so far in my experience, well. This is the "archivist" half of the job and the part I'm most comfortable handing over.

What I'd tell someone building their own

Put the raw material somewhere the publish step can't see. An inbox directory outside content/ costs nothing and prevents the worst accident.
Make the LLM produce a proposal artifact and make approval a hard gate. The gate is what keeps the work yours.
Keep canon in one file and require everything published to trace back to it.
Archive raw input verbatim with provenance links. You will want the paper trail.
When the agent does something you don't like twice, encode the correction in the process doc instead of correcting the output again.

The thing I was worried about going in was that putting an LLM in the middle of a creative project would flatten the writing into generic LLM prose. I think being straightforward about how you use an LLM in the process of creation and providing a view of the raw material is the way to engage with audiences. And I also think such a process will lead to prose that doesn't just sound like any other generated text (which definitely has its own style). Nonetheless the jury really still is out if "good" work can be made this way.