<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Y.K. Goon</title>
    <description>The latest articles on DEV Community by Y.K. Goon (@ykgoon).</description>
    <link>https://dev.to/ykgoon</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F706261%2F9b91b389-9c8a-4b60-8ce8-472fbb695a22.png</url>
      <title>DEV Community: Y.K. Goon</title>
      <link>https://dev.to/ykgoon</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ykgoon"/>
    <language>en</language>
    <item>
      <title>AutoGen, Eliza, Aider</title>
      <dc:creator>Y.K. Goon</dc:creator>
      <pubDate>Thu, 23 Jan 2025 02:28:33 +0000</pubDate>
      <link>https://dev.to/ykgoon/autogen-eliza-aider-24kh</link>
      <guid>https://dev.to/ykgoon/autogen-eliza-aider-24kh</guid>
      <description>&lt;p&gt;I got around to building what they call AI agents. This is a report on the explorations of the toolsets.&lt;/p&gt;

&lt;p&gt;Half my intention here is to make use of code assistants to do the bulk of the work here. &lt;a href="https://aider.chat/" rel="noopener noreferrer"&gt;Aider&lt;/a&gt; has been living on my machine more than half a year; it's time to make it earn its place.&lt;/p&gt;

&lt;p&gt;Aider has been steadily improving, updates are pushed frequently. It's an undiscovered gem obscured by hypes like &lt;a href="https://www.cursor.com/" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But putting it to use is like learning to a new musical instruments, all while having access to the instrument you're good at (handcrafting code).&lt;/p&gt;

&lt;p&gt;Telling Aider what to do is just not as natural as writing code by hand; every so often in the process I revert to writing code myself. I have to actively fight this urge.&lt;/p&gt;

&lt;p&gt;At this point, Aider has this feature that monitor for comments in files and make changes in real-time. On top of that Aider reads documentations when given URLs. Both of these make it game-changing.&lt;/p&gt;

&lt;p&gt;Onto making agents. The instinct is to pick a framework that makes everything easy.&lt;/p&gt;

&lt;p&gt;But I've been around the block. If you intend for what you build to have a decent lifespan, frameworks should be approached with heavy dose of caution.&lt;/p&gt;

&lt;p&gt;Framework-less development means not having your fate handed to other people's hands. But it also means repeating mistakes that has been made by others.&lt;/p&gt;

&lt;p&gt;A healthy use of frameworks is to have them be responsible for the right layers of abstractions; things that you have no interests in tweaking in the future. Problem is you probably have no idea what those things will be when you get started.&lt;/p&gt;

&lt;p&gt;I was aware of &lt;a href="https://www.crewai.com/" rel="noopener noreferrer"&gt;CrewAI&lt;/a&gt; a year ago. I was waiting for cool ideas to hit for me to put it use. That did not come. Not that there was no cool idea, but none that warrant the use of a framework to coordinate multiple agents for one single task.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/truth_terminal" rel="noopener noreferrer"&gt;@truth_terminal&lt;/a&gt; came along, that gave me inspiration. I absolutely don't have to build anything useful. In fact, trying to be useful is recipe for mindblocks.&lt;/p&gt;

&lt;p&gt;What I now want is simple: three unpredictable off-the-wall LLM agents talking to each other. I want to see if they are capable of reaching upon insights that are new to me. This feels promising.&lt;/p&gt;

&lt;p&gt;Now, creating one single agent is trivial. I write arbitrary ones that run from within Emacs all the time.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/elizaOS/eliza" rel="noopener noreferrer"&gt;Eliza&lt;/a&gt; framework is newly popular, the results look promising. I built two agents/characters running on it. The process of creating an agent with lores and specific knowledge was rather involved. It's like a five-page form that you have to fill in the format of a JSON file.&lt;/p&gt;

&lt;p&gt;I randomly generated most of these details, even that took me half an hour to craft a character. The upside is it's an entirely creative process of character creation, void of any engineering concern.&lt;/p&gt;

&lt;p&gt;The inner workings of how these character details eventually got sent to LLMs are abstracted away by Eliza. The end result is impressive, the characters do speak in personalities consistent with what's configured. If you intend for it to sound quirky, given enough sample texts and background stories, it does come out sounding quirky.&lt;/p&gt;

&lt;p&gt;In fact that's Eliza's bread and butter: create personalities. You know, as opposed to getting things done (being useful). Nothing wrong with that.&lt;/p&gt;

&lt;p&gt;Until I want to have two characters talking to each other. There's just no way to it unless I'm willing to have the conversations take place publicly on Twitter. At least in this iteration, Eliza is simply built to run one single awesome Twitter chatbot.&lt;/p&gt;

&lt;p&gt;It feels like the fact that Eliza being written in Typescript is the draw by itself. If this is true, I find this suspicious. The quality of the framework should stand on its own, not its choice of ecosystem. Many products in the Javascript world has this quality.&lt;/p&gt;

&lt;p&gt;That said, the amount of plugins being made for Eliza is impressive as a social phenomenon. Plugins are responsible for giving agents abilities beyond just talking. If Eliza were to have a long shelf-life, these plugins will be the factor.&lt;/p&gt;

&lt;p&gt;However, how &lt;a href="https://en.wikipedia.org/wiki/Lindy_effect" rel="noopener noreferrer"&gt;lindy&lt;/a&gt; is Eliza remains to be seen. From what I've digged around in its innards, Eliza feels overrated.&lt;/p&gt;

&lt;p&gt;That's an ironic statement. In the interest of giving you an idiot-proof method for creating an autonomous-KOL, it does a lot on your behalf. It does too much, I may say. If your use case goes just a little off their course, there's no easy way but to hack the framework.&lt;/p&gt;

&lt;p&gt;Subsequent research brought me to &lt;a href="https://microsoft.github.io/autogen/stable/" rel="noopener noreferrer"&gt;AutoGen&lt;/a&gt;. It does not try to do too much; not being opinionated is part of its deal.&lt;/p&gt;

&lt;p&gt;Most of all, it's trivial in AutoGen to have multiple agents talk to each other. They've got the whole team-infrastructure figured out.&lt;/p&gt;

&lt;p&gt;Once I've tried it, it's not as intuitive to create characters that speak as coherently as Eliza-made characters. Tweaking characters' behaviors took repeated prompt-engineering. But I am better off tweaking prompts in English than wrangling a framework to get what I want.&lt;/p&gt;

&lt;p&gt;AutoGen was created to get things done. It came with agents that browse the web, handle files and run code. All the things an autonomous being is expected to do.&lt;/p&gt;

&lt;p&gt;As far as I can tell, it does not come with RAG built in. The ready-made browser-agent is also giving me a hard time with Playwright, where it's somehow not installed perfectly on Arch Linux.&lt;/p&gt;

&lt;p&gt;AutoGen does not have the social validation of Eliza, but it does have the pocket of Microsoft. I'm more optimistic about AutoGen's lindy-ness.&lt;/p&gt;

&lt;p&gt;It's a matter of time before I get my agents to write and run their own code. For now they would have to settle for making trippy conversations among themselves.&lt;/p&gt;

&lt;p&gt;Before I get there though, I have this inkling that code written by machines should not be subjected to the same criteria as code written for humans. Though like it or not, LLMs still end up writing code like humans.&lt;/p&gt;

&lt;p&gt;But fundamentally, if these are code written autonomously by machines themselves for themselves, there's no reason why code should be arranged in ways that are human-friendly.&lt;/p&gt;

&lt;p&gt;It's conceivable to me that machines prefer writing assembly code because it makes more sense to them.&lt;/p&gt;

&lt;p&gt;It can be argued that we still want code-legibility as a safeguard for humans to peek at, but ultimately that's a losing battle. That amounts to demanding a new life form to conform to human standards, which itself is tenuous at best.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ai</category>
    </item>
    <item>
      <title>Deep dive into Mentat coding assistant</title>
      <dc:creator>Y.K. Goon</dc:creator>
      <pubDate>Wed, 24 Jul 2024 09:52:16 +0000</pubDate>
      <link>https://dev.to/ykgoon/deep-dive-into-mentat-coding-assistant-12no</link>
      <guid>https://dev.to/ykgoon/deep-dive-into-mentat-coding-assistant-12no</guid>
      <description>&lt;p&gt;Following up from playing with &lt;a href="https://dev.to/ykgoon/the-future-of-programming-classical-vs-assisted-coding-with-mentat-and-aider-2d9f"&gt;Aider coding assistant&lt;/a&gt;, I've been using &lt;a href="https://docs.mentat.ai/" rel="noopener noreferrer"&gt;Mentat&lt;/a&gt; to code for the past few weeks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparing with Aider
&lt;/h2&gt;

&lt;p&gt;The way it works is similar. It runs on the terminal with a textual UI. It even lets you set light and dark mode color scheme. Coding sessions involve adding relevant files for context. In the chatbox, you make your wishes and it comes back with an action plan involving code changes. Approve them and changes are made.&lt;/p&gt;

&lt;p&gt;The experience as a whole is close to Aider, except Aider makes a git-commit for every change (which you can opt out). Mentat leaves the version management to you.&lt;/p&gt;

&lt;p&gt;The quality of how you phrase your &lt;em&gt;wishes&lt;/em&gt; determines the quality of the work. You have to be pretty verbose about it. What comes back is a function of your choice of LLM. I won't attribute smartness to the coding assistants, but I would credit superior development experience to them if any. No matter which one you use, you still have to talk to them like a clueless intern.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context limit
&lt;/h2&gt;

&lt;p&gt;Out of the box, Mentat supports is a meager list of LLMs compared to Aider (that might or might not change). I didn't let that be a problem, I hooked it up to use a coding LLM in &lt;a href="https://www.together.ai/" rel="noopener noreferrer"&gt;Together.ai&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But it didn't matter; I ran into context window limit right off the bat. Granted some of my files are &lt;em&gt;production-length&lt;/em&gt;, but I didn't include that many of them. I didn't even get the chance to make it do something clever yet. I was determined to make this work, only context-limit is getting in the way.&lt;/p&gt;

&lt;p&gt;The solution isn't to &lt;em&gt;just use an LLM with larger context limit&lt;/em&gt;. There's always an upper limit, you'd just end up hitting that constantly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Built my own RAG
&lt;/h2&gt;

&lt;p&gt;I heard RAG is the answer. So I built a &lt;strong&gt;middleware&lt;/strong&gt; that sits between Mentat and the LLM.&lt;/p&gt;

&lt;p&gt;This is an OpenAI-compatible REST API (&lt;code&gt;http://localhost:&amp;lt;port&amp;gt;/chat/completions&lt;/code&gt;) running locally, all housed in one Python file. I call it &lt;em&gt;Broken Sword&lt;/em&gt; for easy reference.&lt;/p&gt;

&lt;p&gt;As far as Mentat is concerned, &lt;em&gt;Broken Sword&lt;/em&gt; is an actual LLM service. Within &lt;em&gt;Broken Sword&lt;/em&gt; I capture Mentat's requests, massage the inputs, send to any LLM I want, and return the response in an OpenAI-compatible way. When doing this, I get to see the elaborate directives given by Mentat, that is what prompt-engineering looks like.&lt;/p&gt;

&lt;p&gt;Just by doing this I've enabled Mentat to use any LLM available to mankind. I proceeded use Google Gemini 1.5 to power &lt;em&gt;Broken Sword&lt;/em&gt;, mostly because it has the right balance of quality and cost.&lt;/p&gt;

&lt;p&gt;This alone does not solve context window limit though. This is no more than a glorified pipe.&lt;/p&gt;

&lt;p&gt;Rather than sending inputs from Mentat verbatim, the huge amount of context can be stored in a &lt;em&gt;vector database&lt;/em&gt; and sent over as &lt;em&gt;embeddings&lt;/em&gt; instead. If I understand it right, large chunks for texts get turned into multidimensional matrices of numbers. This is much smaller for LLMs to use rather than original texts.&lt;/p&gt;

&lt;p&gt;I made all that work using &lt;strong&gt;LangChain&lt;/strong&gt; (it has the series of processes abstracted away), with a dash for Flask for simple API. It felt like cheating when I don't yet know how this magic works, but I wanted to hack things fast. I know they say you don't really need LangChain and I believe them, but some day man, some day.&lt;/p&gt;

&lt;h2&gt;
  
  
  It works
&lt;/h2&gt;

&lt;p&gt;When I'm done, Mentat ended up working like it's supposed to. I made it write unit tests, it got written in the style that's consistent with existing ones. I made it write a GitHub Actions workflow, the result was sensible.&lt;/p&gt;

&lt;p&gt;It was gratifying, when it works. Knowing I've made it work with &lt;em&gt;Broken Sword&lt;/em&gt; is doubly satisfying.&lt;/p&gt;

&lt;p&gt;Which got me wondering, why does Mentat not use RAG or vector database like I just did? It felt almost trivial to do so. I took a browse in Mentat codebase, indeed Chroma DB is used (the same vector db I use). So maybe they are doing RAG somehow but not in ways that matter to me.&lt;/p&gt;

&lt;h2&gt;
  
  
  But it's clunky
&lt;/h2&gt;

&lt;p&gt;As I put Mentat to work more and more, the clunkiness becomes apparent. It would crash from time to time. Sometimes because the LLM didn't come back with something it likes, but most of the time for reason unknown to me. Graceful failure isn't its strength.&lt;/p&gt;

&lt;p&gt;There would be times when Mentat would crash after I made a request. Upon relaunching and re-including the relevant files, I repeated the same request (good thing they have chat history to make this easy) and everything works out.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mixture of hand-coding
&lt;/h2&gt;

&lt;p&gt;One question I was hoping to answer in this adventure is the right mixture of using coding assistant this way and directly editing files when solving one problem. In that if possible, should all coding be done from just the coding assistant? Or are we expected to have a code editor ready in the next screen?&lt;/p&gt;

&lt;p&gt;In my case half of my screen is for Mentat, the other half for emacs. I expected Mentat to grant me most what I want but not perfect, and I would make minor adjustments by hand to the same files in emacs.&lt;/p&gt;

&lt;p&gt;If Mentat-style coding assistant has a future, I wonder if that's the way it should be.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
    </item>
    <item>
      <title>The Future of Programming: Classical vs. Assisted Coding with Mentat and Aider</title>
      <dc:creator>Y.K. Goon</dc:creator>
      <pubDate>Wed, 19 Jun 2024 02:29:27 +0000</pubDate>
      <link>https://dev.to/ykgoon/the-future-of-programming-classical-vs-assisted-coding-with-mentat-and-aider-2d9f</link>
      <guid>https://dev.to/ykgoon/the-future-of-programming-classical-vs-assisted-coding-with-mentat-and-aider-2d9f</guid>
      <description>&lt;p&gt;I was trying to catch a glimpse of the future of programming alongside LLMs. I think I ended up discovering a whole new art.&lt;/p&gt;

&lt;p&gt;When you bring up AI and coding assistants, most people think of GitHub Copilot and similar alternatives. By now I can confidently say this: code completion is &lt;em&gt;not&lt;/em&gt; the future. At best, it's just cute.&lt;/p&gt;

&lt;p&gt;The discussion around this among seasoned coders is complicated. It's not that we're unwilling to use Copilot. But when we know our turf well, having a code-completion as assistant gets in the way half of the time. So much of it requires learning a different workflow to complement our existing one. By the time I've explained enough to the machine, I could've coded the solution myself. It's unclear if adopting a different workflow is worthwhile.&lt;/p&gt;

&lt;p&gt;On the other hand, there's a sense that a lot of the reluctance comes from the ignorance of what these tools can achieve. Similar to learning vim-keybindings. I hesitated for many years. But once I've suffered through the learning curve, I swear by it.&lt;/p&gt;

&lt;p&gt;So I put in some time to explore something entirely different. Instead of code-completion tools, I looked at &lt;em&gt;coding assistants&lt;/em&gt; that live up to its true meaning. I narrowed the field down to two: Mentat and Aider.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mentat
&lt;/h2&gt;

&lt;p&gt;I tried &lt;a href="https://www.mentat.ai/"&gt;Mentat&lt;/a&gt; first, a seemingly smaller project of the two. The &lt;a href="https://www.youtube.com/watch?v=lODjaWclwpY"&gt;demo&lt;/a&gt; looks promising, you should take a look first.&lt;/p&gt;

&lt;p&gt;It's a terminal-based application. Installation via &lt;code&gt;pip&lt;/code&gt; is easy. It's made with Textual TUI so that's a nice touch.&lt;/p&gt;

&lt;p&gt;The UX had me at hello. It doesn't try to code with me in Emacs. Instead, I tell it what I want and it will try to deliver in the right places across the project.&lt;/p&gt;

&lt;p&gt;To get it to work, I hooked up Mentat to use a coding LLM by Phind hosted by Together AI.&lt;/p&gt;

&lt;p&gt;Next I have to pick a problem domain. This is my first mistake: I tried using it to solve a bug in my day job. It's made to work on a code base that is 9 year-old by now.&lt;/p&gt;

&lt;p&gt;That broke any available context window limit from the get-go.&lt;/p&gt;

&lt;p&gt;See, when working with Mentat we get to specify the relevant files to work on. Code changes by the machine would happen on those files. These files get submitted to the LLM as context (possibly on top of git logs too).&lt;/p&gt;

&lt;p&gt;A single Python test file of mine run up to 3,000 lines, easy. No LLM would want to entertain that.&lt;/p&gt;

&lt;p&gt;This obstacle got me thinking about fine-tuning a model with my entire code base; or some solution involving RAGs. This can get quite involved; it feels premature. But before I get there, I might as well try Aider first. I shall circle back to Mentat in the future.&lt;/p&gt;

&lt;h2&gt;
  
  
  Aider
&lt;/h2&gt;

&lt;p&gt;Watch the &lt;a href="https://aider.chat/"&gt;demo&lt;/a&gt; first.&lt;/p&gt;

&lt;p&gt;The UX and concepts involved here are similar to Mentat. The difference though is Aider supports Google's Gemini, which has the largest context window out there. If it can't handle my code base, nobody can.&lt;/p&gt;

&lt;p&gt;And indeed it could not. I did the setup (similarly with &lt;code&gt;pip&lt;/code&gt;), worked on the same files from my large code base and Gemini refused to return anything at all.&lt;/p&gt;

&lt;p&gt;By now I think I'm making it do things it's not designed to. Most demos like this start idealistically, without the burden of a 9-year-old code base. So I pulled something out of my idea bank (things I wanted to code but never got to it) and made Aider code it from scratch. Now Aider worked as advertised.&lt;/p&gt;

&lt;p&gt;This project is a web browser extension that's meant render web pages from within a 3D scene, made to be used within a VR device. The details of this application are immaterial. What matters is it make use of Three.js and various pieces of Javascript stack, something I'm not invested in and therefore out of my depth.&lt;/p&gt;

&lt;p&gt;From the get-go Aider created the entire set of boilerplate files, enough for it to work as an empty browser extension. I subsequently spent the whole day working with Aider to get the project to a point where it successfully integrated Three.js.&lt;/p&gt;

&lt;p&gt;Now I can start reflecting on the experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it's really like
&lt;/h2&gt;

&lt;p&gt;Without Aider, a substantial amount of my time would've been spent shaving yak. That include setting manifest files by hand, configuring, doing it wrong and Googling back and forth. All these are low value work, make sense to be done by machines. I wouldn't have taken the project this far in one day coding it myself.&lt;/p&gt;

&lt;p&gt;Real action takes place after the first hour. I made a point of telling it what I want like I would to a junior coder, sparing it from making assumptions. That worked out well.&lt;/p&gt;

&lt;p&gt;When it gets things wrong, it needs help correcting its own mistakes. Chances are it's because I was not specific about what I was asking for.&lt;/p&gt;

&lt;p&gt;When Aider did something unknowingly wrong, I didn't know enough to correct it and assumed it's correct. Further work is built on top of that mistake and cascade into larger mistakes.&lt;/p&gt;

&lt;p&gt;There are two facets to mistakes. When Aider makes mistakes on its own, it needs human's help in pointing them out. Doing so involves being specific about the solution. Just saying the outcome is wrong is not helpful.&lt;/p&gt;

&lt;p&gt;Secondly, the reason I was not specific enough about my request was because I didn't know enough about the intended solution to ask for it. Therefore Aider does &lt;em&gt;not&lt;/em&gt; free you from knowing your stack and technical intricacies.&lt;/p&gt;

&lt;p&gt;About testing. This is highly domain specific. Had I been doing backend work, I would've had Aider code my test cases for me. However mine is a VR project, so it's still down to me to test by clicking on browser. I think it most projects, Aider will end up encouraging a test-driven approach by making test cases easy to create.&lt;/p&gt;

&lt;p&gt;With coding assistants, it's not the case where you ask for the result and it will deliver the solution. For any non-trivial problem, you would have to iterate with it to come to the right solution. So before machines can reason on their own, human is the reasoning component in this loop.&lt;/p&gt;

&lt;p&gt;Like most new skills, learning to get good at working with coding assistants will make you slower before it makes you faster.&lt;/p&gt;

&lt;p&gt;Which leads me to declare this: AI-assisted coding is an entirely different art. It's not better than &lt;em&gt;classical coding&lt;/em&gt; (I have to coin that here); it's not worse either. It's different like Judo and Muay Thai; comparison is unfair without context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Classical vs Assisted
&lt;/h2&gt;

&lt;p&gt;Now that I've established two different approaches to coding, I can now engage in some speculation.&lt;/p&gt;

&lt;p&gt;Here's an easy one: assisted coding works well on popular programming languages (simply because LLMs are well-trained on them). Projects in &lt;em&gt;artisanal&lt;/em&gt; languages (let me introduce you to &lt;a href="https://developers.urbit.org/guides/core/hoon-school/A-intro"&gt;Hoon&lt;/a&gt;) have no choice but to be handcrafted the classical way.&lt;/p&gt;

&lt;p&gt;Classical coders are about &lt;em&gt;how&lt;/em&gt;; assisted-coders are about &lt;em&gt;what&lt;/em&gt;. Consequently, assisted projects achieve objective faster but classical projects maintain better.&lt;/p&gt;

&lt;p&gt;Should any given software project in the future be done with a mixture of assisted approach and classical? I suspect &lt;strong&gt;no&lt;/strong&gt;. In that if a code base is assisted code to begin with, there should be minimal classical intervention.&lt;/p&gt;

&lt;p&gt;Conversely a classical code base should not be tainted by assisted code commits. Even if this has no quality implication, I think it will be socially demanded by team members.&lt;/p&gt;

&lt;p&gt;I can't qualify this point beyond falling back to my intuition, but this aspect will be interesting to observe.&lt;/p&gt;

&lt;p&gt;I wonder how collaboration works differently for an assistedly-coded project. Would problems in a typical FOSS project still exist? If not, is the same pull request workflow of a classical project still relevant?&lt;/p&gt;

&lt;p&gt;The final point is how physical limits of LLMs affect engineering approaches. Let's assume there will always be a limit to context windows in LLMs no matter how much fine-tuning and RAGs are pulled.&lt;/p&gt;

&lt;p&gt;I think assisted projects are likely to discourage monoliths. Because LLMs couldn't fit a big monolith its figurative head, humans go around it by breaking it into pieces. The result end up looking like microservices, whether the problem domain demands for it or not.&lt;/p&gt;

&lt;p&gt;Some may argue that's universally a good thing. That remains to be seen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Going forward
&lt;/h2&gt;

&lt;p&gt;This will be an ongoing research. I'm hopeful to see my toy project to the end.&lt;/p&gt;

&lt;p&gt;I may try Mentat again on a new project at some point.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
    </item>
    <item>
      <title>Dirty code</title>
      <dc:creator>Y.K. Goon</dc:creator>
      <pubDate>Tue, 06 Feb 2024 10:09:44 +0000</pubDate>
      <link>https://dev.to/ykgoon/dirty-code-4k9b</link>
      <guid>https://dev.to/ykgoon/dirty-code-4k9b</guid>
      <description>&lt;p&gt;This is one of those posts that should've been a tweet. That is if I'm successful in distilling the essence.&lt;/p&gt;

&lt;p&gt;Technical debt has a special place in the hearts of software engineers. The privileged ones get to build a business on it, surviving long enough to accrue a lot of them and not fall apart.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Big Tech&lt;/em&gt; tend not to talk about it, preferring to act as if they have none. The real deal talk about it with very slight hint of pride, like daredevils performing death-defying feats.&lt;/p&gt;

&lt;p&gt;There's a special breed of technical debt I want to highlight. For lack of a better term, I shall call them &lt;em&gt;dirty code&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The juxtaposition to Uncle Bob's &lt;em&gt;clean code&lt;/em&gt; is interesting. Clean code is made out of a well-defined set of qualities. The lack of these attributes however does not make a code &lt;em&gt;dirty&lt;/em&gt;; they are simply not-clean.&lt;/p&gt;

&lt;p&gt;Dirty code belongs in the realm of chaos. It is not compelled to confine itself to any arbitrary definition. However, they are exceedingly easy to spot.&lt;/p&gt;

&lt;p&gt;Think of dirty code as low-class technical debt. Wrong placement of files; camel case instead of snake case; bad styling; poor indentation, etc.&lt;/p&gt;

&lt;p&gt;What makes code dirty is their overtness. Other kinds of technical debt/coding malpractice are about unintended consequences from illegible design choices.&lt;/p&gt;

&lt;p&gt;Not dirty code. They stare at your face; the only condition for you to not spot it is incompetence.&lt;/p&gt;

&lt;p&gt;They are low hanging fruits kind of coding malpractice. They are easy to fix, but not easy enough to be automated. Linters sometimes exists to enforce this but not every ecosystem has them; and for those who do I'm sure someone would find it too constraining.&lt;/p&gt;

&lt;p&gt;Mostly they don't get fixed &lt;strong&gt;early&lt;/strong&gt; because they appear harmless, and we're too cool to bother with it.&lt;/p&gt;

&lt;p&gt;When would dirty code becomes a debt so intense that you reach the development equivalent of &lt;a href="https://en.wikipedia.org/wiki/Minsky_moment"&gt;Minsky moment&lt;/a&gt;? How much is too much?&lt;/p&gt;

&lt;p&gt;Here's a suggestion: when it leaks into culture.&lt;/p&gt;

&lt;p&gt;Given the condition that new members join the team, not all of them are confident or assertive enough to challenge the old ways. When they have to write something new on top of dirty code (broken windows), they have two choices: subvert it, or conform to it and leave it alone.&lt;/p&gt;

&lt;p&gt;The safe and probable choice is to stay consistent. Monkey see monkey do. Now one dirty code leads to two. The cycle repeats, two dirty code leads to four.&lt;/p&gt;

&lt;p&gt;Dirty code has now become ingrained in the culture.&lt;/p&gt;

&lt;p&gt;Even more significant design trade-offs don't have this kind of broken windows effect.&lt;/p&gt;

</description>
      <category>coding</category>
    </item>
    <item>
      <title>Feature flags piloting &amp; how to ruin it</title>
      <dc:creator>Y.K. Goon</dc:creator>
      <pubDate>Tue, 04 Jan 2022 06:15:51 +0000</pubDate>
      <link>https://dev.to/ykgoon/feature-flags-piloting-how-to-ruin-it-3ide</link>
      <guid>https://dev.to/ykgoon/feature-flags-piloting-how-to-ruin-it-3ide</guid>
      <description>&lt;p&gt;There is a semi-sophisticated thing we do in managing SaaS in production environment that's not given an proper name yet. For the lack of better word I'll resort to calling it &lt;em&gt;feature flag piloting&lt;/em&gt;. I reserve the right rename it in the future.&lt;/p&gt;

&lt;p&gt;It's a great solution to a very specific set of problems. I'm here to explore what happens when you take it too far.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is feature flag piloting
&lt;/h2&gt;

&lt;p&gt;Say you have a spanking new feature, it's going to impact everyone when deployed. This is potentially risky and dangerously irreversible.&lt;/p&gt;

&lt;p&gt;What you can do instead is to have the feature turned on selectively. This is done by implementing &lt;em&gt;feature flags&lt;/em&gt; within your code logic. When the flag is on, the new feature executes. Feature flag can be global or user-specific; a global one affects everyone and is not selective.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advantages
&lt;/h2&gt;

&lt;p&gt;With this mechanism you can adopt a handful of beta users to test out this feature. If mistakes happen, damage is contained and fixes can take place before a wider roll out. Existing non-beta users are happily cruising in your SaaS without feeling any impact.&lt;/p&gt;

&lt;p&gt;This is effectively changing the aircraft engine in mid air. Whatever you can do to cut down your risk, you should do it and feature-flagging is a great way forward.&lt;/p&gt;

&lt;p&gt;Doing this is more than safety too. It allows feature-development to be agile, giving you the room develop quickly, release for beta-testing, get feedback, make corrections and repeat the loop in a rapid manner without the downside of receiving widespread complaints.&lt;/p&gt;

&lt;h2&gt;
  
  
  But what if you take it too far?
&lt;/h2&gt;

&lt;p&gt;In what ways can this be taken too far? Here's a few:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Beta-testing keeps running and never finishes&lt;/li&gt;
&lt;li&gt;Features flags are never removed&lt;/li&gt;
&lt;li&gt;Too many feature flags are running concurrently&lt;/li&gt;
&lt;li&gt;Inter-dependent feature flags (god no don't do that!)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While you're reaping the benefits of this approach, debt accrue. Over time the price comes in these forms:&lt;/p&gt;

&lt;h3&gt;
  
  
  Involving many humans in operations
&lt;/h3&gt;

&lt;p&gt;Sometimes feature flags are self-managed, a simple checkbox for user to turn on. Sometimes that's too dangerous to even expose. As such turning on feature flags for pilot users involves getting in touch with support staff, backend engineers, maybe even product managers.&lt;/p&gt;

&lt;p&gt;In the chain of operations multiple emails get passed, unnecessary human mistakes get to be made. Multiply that by the amount of beta users across time, this cost of manhour adds up.&lt;/p&gt;

&lt;p&gt;Software is about taking out humans from the loop in the first place. This is a complete failure in that regard.&lt;/p&gt;

&lt;h3&gt;
  
  
  The UI is no longer the source of truth
&lt;/h3&gt;

&lt;p&gt;Very often feature-flags are used for presenting different user interfaces to beta users.&lt;/p&gt;

&lt;p&gt;Beta user sees something different from other users. Over time there are two sets of realities (or more) for different sets of people. They may all be looking at the user-profile page (for instance) but are experiencing something entirely different.&lt;/p&gt;

&lt;p&gt;Now you can no longer count on the UI alone to tell us what is supposed to happen. There needs to be another parameter (the feature flag) to tell us the expected behavior. This becomes a problem that needs additional product-management manpower to managed.&lt;/p&gt;

&lt;p&gt;Imagine a sole pilot user who is the only one getting this new feature. He's been left alone using this, six months later he contacts support for help. Support looks at his screenshot and got shocked, the user-profile (example) page is nothing like what he has seen before.&lt;/p&gt;

&lt;h3&gt;
  
  
  Difficult to debug
&lt;/h3&gt;

&lt;p&gt;Effectively you have expanded the products surface area. Double the area, double the potential for bugs.&lt;/p&gt;

&lt;p&gt;When old version and new versions (via feature flag) are running at the same time, you have doubled the maintenance cost while serving the same amount of users.&lt;/p&gt;

&lt;p&gt;The default bug report is no longer enough to tell you if this new feature is generating the problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost spent are wasted
&lt;/h3&gt;

&lt;p&gt;When a new features is only as impactful as its exposure. By limiting it to only beta users, you may never make back the cost of development.&lt;/p&gt;

&lt;h3&gt;
  
  
  Secret internal knowledge
&lt;/h3&gt;

&lt;p&gt;This whole approach is like ordering off-menu in a high class restaurant. It's cool for the few people involved, but it benefits only them.&lt;/p&gt;

&lt;p&gt;Most internal staff don't know about features that are long hidden in beta phase. When some users request for it, some staff won't know to ask to turn it on.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to manage it better
&lt;/h2&gt;

&lt;p&gt;Here's a few suggestions on what can you do if you have to have feature flags:&lt;/p&gt;

&lt;h3&gt;
  
  
  Limited beta time window
&lt;/h3&gt;

&lt;p&gt;Don't let it run forever. Set a time limit, maybe three months, maybe two weeks.&lt;/p&gt;

&lt;p&gt;When the time is up, evaluate, release it widely and deprecate accordingly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Set conditions to exit pilot phase
&lt;/h3&gt;

&lt;p&gt;Perhaps a fixed time window isn't good enough. Define your own condition on what constitute satisfactory conditions before you feels safe releasing the feature to everyone.&lt;/p&gt;

&lt;p&gt;Maybe it's a collection of pilot users of diverse profiles. Define their characteristics, let them beta-test for a set amount of time (just not forever) then evaluate.&lt;/p&gt;

&lt;p&gt;Maybe you want to define it statistically. Above 85% success rate where no bugs are reported within a period of a week, then exit the pilot phase.&lt;/p&gt;

&lt;h3&gt;
  
  
  Wider release but partial
&lt;/h3&gt;

&lt;p&gt;To further hedge the bet, you may opt to exit pilot phase by releasing only to half the population of users (who didn't opt in as beta users).&lt;/p&gt;

&lt;p&gt;The engineering cost of this is highly subjective, just know that it's an option.&lt;/p&gt;

&lt;p&gt;But same point, eventually this has to end and complete release ought to happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;There may be better ways to manage a pilot phase while being in the middle of it, that's outside the scope of my concern here. In the long run it's better to not stay in it than having two versions of the same product running at the same time (A/B test doesn't count).&lt;/p&gt;

&lt;p&gt;As far as management tactic goes, it's a simple matter of setting an alarm to go off at a certain date telling you to re-evaluate a specific feature.&lt;/p&gt;

&lt;p&gt;Any good thing can be taken too far. It doesn't invalidate the approach, just be mindful of the limits.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>webdev</category>
      <category>management</category>
    </item>
    <item>
      <title>Flirting with Nyxt browser</title>
      <dc:creator>Y.K. Goon</dc:creator>
      <pubDate>Fri, 24 Sep 2021 03:26:16 +0000</pubDate>
      <link>https://dev.to/ykgoon/flirting-with-nyxt-browser-3001</link>
      <guid>https://dev.to/ykgoon/flirting-with-nyxt-browser-3001</guid>
      <description>&lt;h2&gt;
  
  
  The browser is an OS
&lt;/h2&gt;

&lt;p&gt;The web browser is the most important operating system inside your operating system. If it's not the most important software you use, it's probably a close second.&lt;/p&gt;

&lt;p&gt;When a tool is that important, you better served having a deep level of control over it. It should behave exactly the way you want it, and the way you tell to behave should come in the form of code that's shareable.&lt;/p&gt;

&lt;p&gt;That's what &lt;a href="https://nyxt.atlas.engineer/"&gt;Nyxt&lt;/a&gt; is about. I've been fascinated by it since last year.&lt;/p&gt;

&lt;p&gt;It's been progressing well, version 2 right now is a vast improvement over version 1.&lt;/p&gt;

&lt;p&gt;To understand what Nyxt is trying to do, you have to first understand what emacs really is.&lt;/p&gt;

&lt;p&gt;To understand what emacs really is you have to use it. And not just open it and take a look (there's nothing to look at), you have to dive in and use it. Give it four months and you may get the idea. Hearing people describe emacs is like reading the descriptions of a feeling. You really only get it when you feel it yourself.&lt;/p&gt;

&lt;p&gt;Few people will get it but those who do will be die-hard about it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trying it as a secondary browser
&lt;/h2&gt;

&lt;p&gt;Switching browsers are typically a low cost move. Bring over bookmarks, install a handful of extensions and you're mostly done.&lt;/p&gt;

&lt;p&gt;But Nyxt isn't like that and I know it. So I considered adopting it as a secondary browser. But I'm not sure what secondary even mean. At best I arrive at doing non-mission critical thing on Nyxt, like Twitter.&lt;/p&gt;

&lt;p&gt;So I did that.&lt;/p&gt;

&lt;p&gt;Nyxt so is not ready for prime time. It may get there eventually but probably by version 4 or so.&lt;/p&gt;

&lt;p&gt;It sells itself as keyboard driven, but even on that aspect is clunkier than Firefox with Vim Vixen installed.&lt;/p&gt;

&lt;p&gt;The built in dark mode is serviceable but nothing like Dark Reader can do.&lt;/p&gt;

&lt;p&gt;Support for Chrome extensions are work in progress apparently.&lt;/p&gt;

&lt;p&gt;And it doesn't even come with reader-mode yet. Now that's a deal breaker.&lt;/p&gt;

&lt;p&gt;I know with enough custom code I can take care of all the above but the cost would be prohibitively high.&lt;/p&gt;

&lt;h2&gt;
  
  
  If it ain't broken
&lt;/h2&gt;

&lt;p&gt;I can't yet build a strong case on why I invested time on Nyxt other than wanting to play with new toys. Firefox is working very well, for now it's in no danger of being dethroned.&lt;/p&gt;

&lt;p&gt;I don't mind that Nyxt falls short for now, the project has gotten enough traction that it will likely go on.&lt;/p&gt;

&lt;p&gt;But I care about my tools. I want them to be powerful, and the less of them the better.&lt;/p&gt;

&lt;p&gt;There's nothing out there like emacs, but Nyxt comes very close. It has the right intention, something completely unique in the browser space.&lt;/p&gt;

</description>
      <category>browser</category>
      <category>desktop</category>
      <category>emacs</category>
    </item>
    <item>
      <title>Wabi Sabi Limit</title>
      <dc:creator>Y.K. Goon</dc:creator>
      <pubDate>Wed, 15 Sep 2021 07:19:31 +0000</pubDate>
      <link>https://dev.to/ykgoon/wabi-sabi-limit-1128</link>
      <guid>https://dev.to/ykgoon/wabi-sabi-limit-1128</guid>
      <description>&lt;p&gt;Wabi sabi carries the sprit of embracing inherint brokenness. &lt;/p&gt;

&lt;p&gt;The wabi sabi limit is the point where cost of maintenance exceed the risk of re-building your software from ground up. A point where technical debt has gotten so high, bankcruptcy is the only choice.&lt;/p&gt;

&lt;p&gt;This is an exploration of the nature of wabi sabi limit.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://blog.acolyer.org/2020/02/14/programs-life-cycles-laws/"&gt;The law of program evolution&lt;/a&gt; dictates that we're gonna run into wabi sabi limit if our software survive long enough.&lt;/p&gt;

&lt;p&gt;For most teams, re-writing a production software from scratching is to be avoided. The reasons are best summarized by "remember Netscape 5", and better elaborated by many other people before.&lt;/p&gt;

&lt;p&gt;But re-writes are not avoided at all cost. They do happen, decisions to do so do get reached by reasonably smart people, presumably some of them with really good reasons, even taking into account of what we know today. Clearly some of them think the cost and risk is worth paying for.&lt;/p&gt;

&lt;p&gt;What are the costs? Too many to count, but one interesting pre-requisite for engaging in a re-write is 100% automated test coverage.&lt;/p&gt;

&lt;p&gt;No it's not true, that would be ridiculous. That would rule of 99% of software in production.&lt;/p&gt;

&lt;p&gt;Which sounds right. I'm willing to bet the true pre-requisite is probably close to 100% coverage.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://engineering.fb.com/data-infrastructure/messenger/"&gt;Facebook Messenger&lt;/a&gt; proudly re-wrote their iOS app with acceptable results (I assume, I'm not a user). I also wanna guess they have the luxury of freezing features for the new build to catch up. Most software that matters can't afford feature-freeze. It'll be interesting to see how FB came to make this bet.&lt;/p&gt;

&lt;p&gt;How did FB know they've hit the wabi sabi limit? How conscious were they about this limit? How much of the decision was quantified? Of the quantification, how much of them are full of shit?&lt;/p&gt;

&lt;p&gt;How are we supposed to find our wabi sabi limit? It's tempting question, but I highly suspect it's the wrong thing to ask.&lt;/p&gt;

&lt;p&gt;It's tempting for the fact that we wanna imagine a two-dimensional graph where cost of maintenance squares off against risks of re-writes. Soon as we get hold of a model, we can point at it to the suits, and they'll have to take it as data.&lt;/p&gt;

&lt;p&gt;Nope, I'm more interested in the truth. But you go ahead anyway, the suits will love it.&lt;/p&gt;

&lt;p&gt;Cost of maintenance can be calculated if we squint hard enough, but I can't take a model seriously if risks are quantified.&lt;/p&gt;

&lt;p&gt;There's amount of value-at-risk equations you can throw at it to get close to modelling reality. Every risk variable we have would be made out of layers of assumptions, it's assumption-turtles all the way down.&lt;/p&gt;

&lt;p&gt;If we can't quantify the wabi sabi limit, sounds like it isn't much good helping us make decisions. But maybe that's not the point.&lt;/p&gt;

&lt;p&gt;The wabi sabi limit has more power being an mythical figure, The point is for it to be creature behind the curtain, a cautionary tale. &lt;/p&gt;

&lt;p&gt;It's not meant to be figured out, but to actively avoid running into. Because by the time you do, it's too late.&lt;/p&gt;

&lt;p&gt;One approach is to treat wabi sabi limit as an &lt;strong&gt;inevitability&lt;/strong&gt;. If we see impermenance in everything, all pieces are susceptible to be thrown away and replaced.&lt;/p&gt;

&lt;p&gt;Then it helps to decouple, micro service, isolate code rot, minimize surface area of re-writes. Basically everything they've been telling you is a good idea. &lt;/p&gt;

&lt;p&gt;The wabi sabi limit is not static. Even if you hold the re-write risk constant, the cost of maintenance stands a chance of going down given the will.&lt;/p&gt;

&lt;p&gt;Wabi sabi limit is not solely an engineering concern. The suits arguably care more about the risk equation. In fact engineers enjoy re-writes regardless of the business consequences.&lt;/p&gt;

&lt;p&gt;Acknowledging this limit allows both camps to point to a bogeyman. Not in such a way that measuring it ends up invalidating a gamed metric, but allows the team to develop a negative capability in dodging the risky bullets of re-writes.&lt;/p&gt;

</description>
      <category>design</category>
    </item>
  </channel>
</rss>
