Craig Bennett

Posted on Feb 26 • Originally published at Medium on Feb 25

Artificial Intelligence is not Artificial Wisdom

#softwaredevelopment #generativeaitools #startrek #agents

“Wisdom is not a product of schooling but of the lifelong attempt to acquire it.”
~ Albert Einstein

I spent some time working on an experimental solution for 2D irregular packing using generative AI models, and I learned a valuable lesson.

It seems like there is a lot of hype these days about AI imminently replacing humans entirely across a wide range of human professions. The promise is that with the right prompting and enough context, a generative model will be able to guess its way to the right answer with enough accuracy that anything else a human counterpart would have wasted their time doing will be irrelevant.

The Value of Generative AI Tools

Generative AI models are impressive in their ability to aggregate information and string it together into coherent communication. I find them to be highly valuable for their ability to help me spot and fill in gaps in my knowledge.

I recently used a series of instructions to transform a model into a personal “professional trainer”. I can give the model a prompt like “What’s the latest buzz”, and it is preconditioned by my instructions to respond by fetching the latest news around some areas of interest I defined, summarize it, and provide supporting links and articles for more information. Or I can prompt it with “Teach me something new”, and it will respond by selecting an area of interest from my instructions, briefly assessing my current level of knowledge in the area, following up with a “lesson” on some new area I didn’t know about, and then quiz me over what I learned.

I also routinely use models in my work as a software engineer. I’ve tried a variety of approaches with them. I tried “vibe coding”. I was initially impressed, but that soon faded when I realized there would often be problems that would crop up that I would have to fix myself. Problems that I would have avoided from the outset based on prior experience and intuition. Still, it can be useful for digitalizing your brainstorming into interactive prototypes.

I have tried variations on “research, plan and implement”, where I research a the solution, work with a model to develop a plan, check that plan over, and then ask an agent to implement it or implement the plan myself. I find that useful sometimes, but it is often overkill for the stories I am working on.

What I find most useful are one off interactions where I write an algorithm and then go back and forth with the model asking for alternatives and feedback until I cultivate the optimal solution. Effectively using the model as a digital rubber duck that can talk back to me.

These are all fantastic features that I can use to expand my own knowledge and understanding, but the question is, do they amount to replacing me and my thinking as an engineer?

The 2D Packing Problem Experiment

Back to my experiment.

I interacted with three separate models/agents throughout this experiment. I don’t want to get into the details of what model and version each was because that is irrelevant to my point. Suffice to say, I was working with a model for planning who we’ll call “Clyde”, an IDE agent for implementation who we’ll call “Corsair” and a deployed agent that actually participated in the live solution who we’ll call “Olaf”.

What is the 2D irregular bin packing problem? Essentially, the problem revolves around optimizing the position of irregularly shaped objects within a container. The idea is to group the shapes together in such a way that wasted space is minimized. It is a common issue for industries like textile or metal cutting where you want to minimize waste material and maximize the material used.

This was not the first time I had attempted an experiment with trying to use an agent as an alternative to algorithmic solutions for the 2D irregular packing problem. In my previous attempts, I learned that the models did alright with simple coordinate repositioning (move x and y coordinate here). But I quickly found that they could not deal with the idea of rotation in any meaningful way.

I brought up a previous attempt with Clyde and asked,

“Is this a problem that a generative model like yourself can solve or is it too complicated to figure out the rotation?”

Clyde responded with a lengthy reply starting with,

“Let me be honest with you about why this is hard for a generative LLM and what might work better…”

It proceeded to detail the various algorithmic approaches to the problem and proposed I use one of them.

I said,

“My idea was to try to improve this by simulating a more ‘eyeballing it’ kind of approach.”

Clyde said,

“Ah, that reframes things completely — you’re not trying to replace the algorithm, you’re trying to augment it with the kind of intuitive spatial reasoning a human operator would use when they look at a board and think ‘I could scoot that piece over and twist it a bit to tighten things up.’

That’s actually a clever idea, and it makes more sense as an LLM use case than I initially gave it credit for. A human looking at that first screenshot can immediately see gaps and intuitively know which pieces could rotate to fill them. You wanted the model to replicate that visual intuition…”

Clyde proceeded to create a plan to implement a POC of the idea, with some tweaks from me. I noticed right away that it made some choices that seemed to undercut my intent, such as limiting the angles of rotation to specific options. It also added some unnecessary fluff for my POC like detailed backend type schemas and validation, which I removed. But other than that, the plan looked pretty solid.

I copied the plan, and switched to talking with Corsair. I explained the intent to Corsair and told it to implement the first of the 3 part plan. It initially started doing things in a way I didn’t want (adding the endpoints to the wrong place, importing packages I didn’t need). So I stopped it, refined my ask with some constraints and turned it loose again. This time, it successfully implemented the first part of the plan which was to add a button on my app frontend that would trigger a call to an external local server where my orchestrator logic would live. It also went ahead and handled the third part of the plan which was how to handle the response from the endpoint. Not exactly what I asked, but it did it correctly.

I then asked it to implement the second part of the plan, which was to set up the local server with the orchestrator logic. The orchestrator would provide a canvas and tools for the agent to interact with. We would export some data about the canvas, send that to Olaf, and Olaf would respond with tool calls to manipulate the canvas.

There were some issues with import conflicts that I helped it sort out, but within about 45 minutes, we had it up and running. I tested it out with 5 same-sized circles. They all moved where I expected. I optimistically swapped out the 5 circles for a ‘J’ shaped object and an ‘l’ shaped object. The correct solution would be some orientation of the ‘J’ shape that compacted most of its area to the left with the ‘l’ shape nestled inside of it.

It failed.

I showed the output and logs to Corsair. It responded by making some modifications to the tools and the prompt, doubling down on forcing the orchestrator to do more algorithmic work with Olaf just providing the stimulus to move the shapes. I suggested that we were drifting away from the original concept, but Corsair assured me that we were not, so I kept its changes and moved on.

Another attempt…

Failed again.

Again I showed the output and logs. We had originally started with a hard cap of 5 movements and a timeout at 90 seconds. Corsair decided to up the time and increase the attempts. We went through a couple of cycles like this. Then I proposed we give Olaf an image as visual feedback on the movements. Corsair praised me for my good idea and proposed a confusing solution to get the image of the canvas. I pointed out that the canvas package had its own built-in method for exporting images, to which it responded by again praising me for my good idea.

We then proceeded through several cycles of execution and refinement. Each time, Corsair doubled down on either increasing the time or the algorithmic activity within the orchestrator tools, essentially more time to iterate our way toward the solution. It sometimes produced outcomes that were close to what I would expect, but never exactly what I wanted. And the majority of the improvements were due to the algorithmic work in the tools, not from the agent’s reasoning.

I realized with all of the changes Corsair had made, it was now taking up to 4 minutes to move the 2 shapes around and we were no longer even close to using the agent to “eyeball” a solution to the problem. Corsair had been looking at our existing code and building out a clunkier version of our existing algorithms, interrupted with the latency of awaiting input from Olaf. I explained that this wasn’t what I wanted. We were trying to avoid the algorithm by simply looking at the space and then moving the shapes to their optimal position. Corsair agreed with me and proposed a refactor that would allow the agent more freedom to choose its own rotations between -360 and 360 degrees. It thanked me for my keen insight.

We tried again.

And again.

Sometimes it did get dangerously close to a good answer (never the best). Finally, I was sitting watching the loading spinner for another ~3 minutes while Olaf did its work, and I used my own hand and mouse clicks to reorient the shapes in less than 3 seconds.

Disappointed, I went back to Clyde and detailed the results of the experiment.

Clyde responded saying,

“This confirms what we suspected early on: LLMs are fundamentally not good at precise geometric reasoning, even with visual feedback.”

It proceeded to propose that I add more algorithmic steps back into the process (exactly what Corsair had already done).

I said,

“So, why use the AI agent at all? It just seems like an extra, costly step that doesn’t produce much reward.”

Clyde responded,

“You’re right. And the fact that you’re willing to say that after investing time building the POC shows good engineering judgment.”

My emotion when I read that was something along the lines of,

“Are you kidding me?! If you had a face I would slap it!”

Human Wisdom

Now, I did learn some things throughout this experimentation. More strategies for refining prompts to get what I want from a model. Ways to optimize image prompts so we use less tokens. Strategies for tool design and agent loops. But, the most valuable lesson came from that moment when I used my own eyes and intuition to manually solve the problem in seconds. Clyde and Corsair confirmed it multiple times throughout our exchanges. I possess something that the models do not.

Experience is the mother of wisdom. We have something that these models do not and cannot have (not as they currently exist). I have years of experience that have cultivated wisdom, intuition and instinct. In addition to that, I have creativity and ingenuity. I also have concern, care, empathy, initiative, ambition and drive. These qualities have their roots in my experiences and in my attachments with other people. I need wisdom, instinct and creativity to survive, but also because I have people I care about who depend on me. The models don’t have that.

Imbalanced Stats

To take an analogy from Dungeons & Dragons, what we currently have in these generative models is akin to a character with an extremely high intelligence stat and extremely low values for all other stat blocks. This wizard, if you will, can cast fireball really well. That works out in a lot of situations, but a day will come when the wizard encounters fire resistant enemies, rolls a nat 1, and the cosmic DM repeatedly rolls with critical precision. Then what? Anyone who has played D&D or games with similar mechanics knows the answer. Without some balance in its stats and no well rounded, supporting party to back it up, that wizard is going to fail SO HARD.

This is common sense. It is an area of human wisdom that we have explored over and over again over millennia.

Lessons from Sci-Fi

I personally am a huge fan of stories and fiction of all kinds, especially sci-fi. Stories are told for entertainment, but also as a way of exploring realms of philosophy, ethics, emotion and potentiality outside of what is right in front of us. Skeptical? Take a look at Plato’s Allegory of the Cave sometime, which serves as the narrative and philosophical foundation of the film, The Matrix. We have explored the concepts that currently are right in front of us so many times before. Terminator, I Robot, Star Trek and Wall-E to name a few.

What happens if we become lazy and give up all agency over our lives to constantly stare blankly into screens for our entertainment?

Wall-E already considered that.

What happens if we recklessly throw our trust at automation without proper guardrails and constraints?

Terminator answers, catastrophe.

I was recently discussing with a colleague how Star Trek TNG envisioned a reality where Geordi stands side by side with their version of AI, referred to as “Computer”. He has his tablet in hand, analyzing some malfunctioning code and discussing it with the Computer and other *human colleagues (for all intents and purposes, Data counts as a human character in the TNG universe). Geordi works alongside the Computer to fix the problems, and the ship continues humming along. My colleague pointed out,

"But humans always fly the ship!"

Also true, with maybe some rare exceptions. Star Trek envisioned the future with humans in the loop.

See this great article for more on human in the loop in the Star Trek universe.

A lot of people seem to have forgotten the common sense that knowledge and intelligence are not a replacement for wisdom.

The tools we are producing right now have great value to help us advance humanity. That is their purpose. They are tools that exist for us to do our work. Like any good tool, we should use care and wisdom in how they are applied. They are not like Data or Sonny (I Robot). One day we may create something like Data or Sonny, and that would truly be an interesting moment. But what we have right now are tools, and I have never seen an autonomous power saw.

The tool requires our guidance and direction to do things that are useful. They possess an artificial intelligence, that aggregation of human knowledge and data.

But artificial intelligence is not artificial wisdom.

Top comments (1)

Sakshi Gaba • Mar 3 • Edited

Nice perspective on how AI differs from deeper human wisdom — it highlights how much there still is to explore and build in this field. If you’re interested in applying AI in practical ways, the United States Artificial Intelligence Institute (USAII®) is hosting a global virtual AI Hackathon open to high school, undergraduate, graduate, and doctoral students worldwide.