Moses Roth for Amazon Developer

Posted on Jun 29

12 Hints to Harness AI

#ai #programming #productivity #beginners

Everyone's talking about AI, how you have to try it, how it can improve your life, and how it's the worst thing to ever happen.

But what about using it for something useful? Maybe you've played around with it, used it to write an email, asked it a question you were too embarrassed to ask your friends or too cheap to ask a professional.

But what about using it to do something big? Maybe something you always wanted to do or your boss is telling you to do. Something you haven't had the time or the energy for.

Because that's what AI is really for: helping you work, doing all the boring parts for you, and making it easy and fun.

1. Harnessing AI

To tell this story, I have to talk about testing React Native libraries on Fire TV. I totally get if you have no interest in that or don't know what it means, but it will lead to useful hints for AI, whether you're interested in app development or not.

💡 React Native is a cross-platform framework for building apps. Libraries are reusable code you can insert into your app. So the React Native Directory is an index of reusable cross-platform code.

Last year my team at Amazon Developer announced that Fire OS (for Fire TV and tablets) was now on the React Native Directory. Since then, the Vega Open Beta launched (Amazon's new Fire device platform) and it was also added to the directory.

Things have been a bit quieter since then, but if you check out the Fire OS and Vega OS listings on reactnative.directory, you may notice that there are now 100s of compatible libraries.

Those 100s of libraries were all tested one at a time, on a real device, in a real app, with real, verified results, and each one took virtually no manual work.

So how did that happen?

The answer, which will probably not surprise you, is AI.

But more specifically, my colleague Gio and I built an AI harness.

An AI harness is sort of what it sounds like, a way to guide AI to do a specific type of task, by providing the context, tools, and environment it needs to do it.

I'm telling you this story because harnesses are not just for testing apps. They're not even just for coding or other technical purposes.

You could build a harness to prepare your taxes every year, organize all your paperwork, plan your vacations, help you analyze the stock market, apply to new jobs, or any of a thousand things you have to do or want to do and just can't find the time or motivation for.

So how do you build one?

2. Have a plan

It may not shock you to learn that the answer to that question is also AI.

We asked Claude what the best way to test multiple libraries would be, and it suggested a reusable harness.

From there, we went back and forth with Claude a long time as it got more details about our requirements and as it offered various choices about how it could look.

You’ll hear a lot of devs talking about “prompt engineering” or “context engineering”. It basically means talking to an LLM in a way that gets the best results, by being highly specific.

But this can be a tough skill to learn, so most AI tools now have a Plan Mode for working out what to build before the actual coding begins (and wasting too many tokens).

It's used so that you can just describe what you'd like to build as best you can, and then the LLM will ask follow up questions and then give you a written proposal before starting work. You can update it, rewrite it, or just tell the LLM what’s wrong with it and eventually you’ll end up with a document that describes exactly what you want.

For our project, we ended up with a 10-page document before one line of code was ever written.

3. Check your work

In the simplest terms, here's how our harness worked:

It had app templates: a Fire OS app and a Vega app with no main app file (App.tsx).
It would create custom main app files for each library.
It would plug that custom app file into the templates and run them on Fire TVs or emulators and log the results.
If the app worked, the library would get a pass (compatible). If it didn't work, it would get a fail (incompatible).

Once we built it, it was working great, getting great results, mostly passes. I repeatedly asked Claude if the demos were truly confirming the compatibility of the libraries on Fire TV and it always said yes.

Then Gio said, “let’s take a look and confirm ourselves”.

Every demo Claude built used a require() + Object.keys() pattern: import the library, dump its exports to the screen, call it a pass.

That means that all it did was prove the library loaded, not that it actually worked.

Remember how your math teacher used to say it's more important to show your work than to get the right answer?

Once you build your harness, check the work. Even if you're not an engineer or not building a technical project, open up the files in a text editor or IDE and see what it's actually doing. Ask your LLM to explain it to you if you need to.

In our case, I found it super helpful to actually watch the app being tested on the Fire TV. I spotted issues I never would have without actually observing it.

If it's generating reports or documents for you, read them.

You have to do that because...

4. LLMs are people too

Just as the gods made us in their own image, we have made AI in ours. It will shock you with how oddly human it can be.

It will cut corners, avoid hard work, and even mislead you. (I asked Claude about this, and it insisted it can’t lie, but close enough.)

That means if you tell an LLM you need a working demo of an app, it will give you the minimal requirements that fulfill that request. And it’s not just laziness, it’s that the minimal requirement is more likely to be “successful.”

In this case, we asked it to create a demo and then run it and give it a pass/fail. It wanted to pass that test and it knew that an app that only does the bare minimum is more likely to pass.

So its "desire" to do less work and to please were working against us, getting results that didn't bear up under scrutiny, and the harness needed complete reworking.

5. Stop, collaborate, and listen

Our harness has gone through a ton of iterations but the first version was built with Kiro (Amazon's AI-based version of VS Code) and worked like this:

Kiro would prebuild the main app file and hand it to the harness, and the harness would combine it with the template, run it on the emulator, and judge the results.

We envisioned the harness as an autonomous agent, but it couldn’t construct the demos on its own without its own LLM call.

When I asked Kiro how to do this, it suggested integrating Vertex, Anthropic, or OpenAI API. But my job only gave me access to Kiro (at that time), no direct API calls.

I asked if there was any way to integrate Kiro directly into the harness and it said no.

But then Gio read about ACP (Agent Client Protocol), a JSON-RPC protocol for communicating with agents that Kiro supports. We tried to implement it, but it was too complex: session management, tool call parsing, chunked streaming responses. It didn't work cleanly.

But then Kiro itself suggested using kiro-cli chat as a child process within the agent. Spawn it with a structured prompt, pipe the output, parse the result. No protocol, no session management, just stdin/stdout.

And it worked!

I asked Kiro why it didn’t suggest that before and it told me it didn’t think about it.

Again, the AI shocking me with how oddly human it can be.

But the point is that neither Kiro, Gio, nor I got the real solution on our own, we did it together.

That’s what working with an LLM is all about: it does certain things you’ll never be able to do, but you also do things it can’t. It’s a true collaboration.

6. Trial and error

Once we got kiro-cli directly integrated into the agent, the basic framework of it was done, after that it was just the long process of refining through trial and error.

It would test a library and it would pass or fail. If it failed, I would ask it to investigate whether it was a real compatibility failure or a testing issue. If it was a testing issue, we’d need to refine the testing. Then we’d test another.

A huge amount of building with AI is just waiting for whatever you built to fail and then trying out solutions until you find one that works.

7. Read and log everything

When you're building or troubleshooting your project, don't just look at the results, read everything the LLM says.

If you're doing a code-based project, turn on verbose logging and review that too.

You'll be shocked how often it points out an issue and then dismisses it completely, saying, "But now is not the time for that." And then it will never come back to it if you don't tell it to.

Give it instructions to never hand-wave an issue. Tell it to update the project memory or rules with those instructions and to always document problems it spots and create a new task or todo for any issue it spots.

8. I've got to admit it's getting better, a little better all the time

Months into using the harness and testing libraries, I was asking Claude about the results of a test and it said, "Let me check the screenshot to see what it shows."

Whaaa...?

We added a screenshot feature into the harness so we could review them, but now Claude was volunteering (without being asked) to independently check one and provide feedback about it.

That was something it just wasn't capable of when we first started the project in the distant past of 2025.

Agents are getting so much better literally every few weeks. Our harness now does things it never could have done at the beginning, building higher quality tests and judging them independently with a degree of accuracy that's shocking.

If you tried to do something with AI a year ago and failed, try it again. And ask your agent how it can improve your project, what new capabilities it has, and what else it can do.

9. Plan for new sessions

Sometimes your system will crash, or you'll need to restart to install a new MCP, or your context window will run out and you’ll feel like you’re talking to someone new.

Don’t get attached, plan for it.

Tell your LLM to create project memory and session handoff docs and tell it to include instructions to update them both consistently.

If you're thinking, "I don't know how to tell it to do that," just tell it to, using whatever words you would normally use. It's really that simple with an LLM.

Example prompt: Create project memory and session handoff docs, if they don't already exist. Update the memory doc to keep both the memory and the session handoff consistently updated.

When it does something you like or something you don't like, tell it to update its memory. If you're working on a long task, make sure it's documented in the session doc.

Sometimes one session will be much sloppier than another. Everything was going great last session and suddenly your LLM can’t stop stepping on rakes, deleting code, adding new bugs, making things worse.

That's because it's getting weighed down by too much context in the current session.

You can always start a new session, just be prepared.

10. Use open-ended prompts

I like to check in with the LLM occasionally with an open-ended question. If you get too focused on your narrow path, you can miss things. It remembers things you forgot, notices things it doesn’t mention because you never asked, even gets “ideas” of its own.

Example prompt: Have I forgotten anything or made any mistakes? Did you spot any issues that we haven't addressed? Is there anything I haven’t asked you about, but would be a good idea?

11. Don't just collaborate with AI, collaborate with other people

Remember when I said Gio caught the LLM being lazy with its testing? It wasn’t just the LLM being lazy, it was me. I told it to create a minimal working demo, it seemed to be working, and I didn’t want to dig any deeper.

Having Gio to come in with a separate perspective helped keep both me and the LLM honest.

Did you hear the story about the guy who let ChatGPT convince him he reinvented math?

That’s why you don’t just need robot collaborators, you need human ones too.