Nearly all developers are taking advantage of AI coding tools, but what happens when you have a particular niche, where not only do LLM's have less knowledgeable, but potentially their knowledge is based on the general, which is straight up wrong for your niche.
This is what I found when making Power Platform Code Apps. Code Apps are full React apps that are hosted in the Power Platform. For that reason they have a specific SDK kit and cli commands. Models like Opus 4.6 do a good job considering how niche it is, but imagine when you then build someone bespoke on top of that niche. That's what I did, as I don't use React I wanted a vanilla JavaScript build (before you comment yes I know I should use React and Typescript, but remember I'm a little strange). This meant there was simply no training or even docs on it. And because it was using vanilla JavaScript, all of the training data actually kept sending the LLM down the wrong path. So at this point I wanted to create my own coding agent.
In general there are 3 ways to get a more bespoke model
- Build your own
- Fineโtune or distil an existing model (optionally with reinforcement learning)
- Prompt
You can imagine the path I chose, yep the easy one.
- Prompt Overview
- Platform
- Implementation
- Learnings
1. Prompt Overview
If you already know this feel free to skip, but I wanted to give a quick overview of the full prompt stack.
There are many levels to a prompt, each higher and more important, you have:
Model System Prompt
This is the highest level, often added by the model owner, it is for things like security and legality. It will cover things like:
- Not to share how to do illegal actions
- Not to encourage harmful actions
- Not to share confidential information about the model Not an exhaustive list
Application System Prompt
Next level down, and this will be what the application you use adds. This covers specific tools the application provides, and pattern/mechanisms the app designer found to improve performance. Examples include:
- This Tool reads the local folder
- This Resource provides information about x
- When dealing with cli ensure that authentication is validated first Theoretical examples, as these are closely kept secrets
Instruction.md
Instruction files are project specific instructions, these generally cover things like:
- Naming conventions
- Folder structure
- Design principles
- What Skills you want to be used when
- Any learnings the LLM has made (you can set the agent to update the file itself)
This helps ensure the project is consistent and easier to read/maintain.
Skill.md
These kind of blur the line between prompt and context. They are specific instructions/knowledge sources for specific situations. By moving them out of the instruction the LLM decides when to add it in (kind of like a dynamic context/instruction). Examples include:
Instructions and Skills where created by Anthropic and designed for Claude Code, but other tools and models can use them to, just with less consistent or hierarchy results.
Your Prompt
And finally we have your prompt that you send, with any additional context.
2. Platform
There are a few ways I could approach this, but the ones I considered the most were:
- CLI Wrapper
- VS Code Extension
- GitHub Copilot Extensions
I use GitHub Copilot so you can see why my choices are what they are, and with my experience being in VS Code Extensions, and the fact I like to hybrid write code with the agent, it made sense for me to build there, though one day I want to try a CLI wrapper.
Call out here, you will see later that you can get most of the functionality with just a well setup workspace, covering instructions and skills, but that's no fun ๐
Building in VS Code has some killer advantages:
- Built in auth to GitHub Copilot
- Built in terminal
- Can have custom UI
- Expand beyond text with buttons etc
- No hosting/backend
- Distributed for free in VS Code market
So once decided on platform the next step is the build.
3. Implementation
My AI Coding Agent was going to have 3 main advantages over simple GitHub Copilot
- Simplified UI based CLI commands
- System Prompt - built specifically to ensure the LLM didn't use its JavaScript training to go off plan
- Skill files - all of the learnings from implementing CodeApp JS
Simplified UI based CLI commands
For Code Apps to work they require the Power Platform CLI, this gives the ability to authenticate, create connections, and deploy to the Power Platform environment. So to make things more LowCode and easier I want to abstract away those commands and make it more integrated.
For this I built a kind of pipeline based on button presses, these:
- Setup the CodeApp JS files
- Authenticated with tenant
- Listed and selected environment (updating config files too)
- Created connections based on what code was written
- Deployed app
System Prompt and Skill.md
This were most of the value codes from, I've banded them together because they are created in the same way, with the only difference being system prompt is always used and skills specific to the task.
To create the right prompts and skills I kind of trained the model, and by that I mean I built lots of apps. During the process I made sure I read the reasoning, and every time there was a bug, built the resolution into the files.
Looking at my GitHub Copilot usage can you guess the days I was testing/learning ๐
It's time consuming, boring, and expensive, but by working this way the idea is to only make the mistake once.
And that's the real secret sauce, all I'm doing is learning from experience and then documenting it so that the agent knows what to do and what not to do.
I think this process of training/learning is key to all AI development, including Copilot Studio. You need to test, test, test, test, with each test generating a nudge/tweak to your instruction/prompt. This iterative approach improves results and consistency from your LLM's.
4. Learnings
There are a couple of things that definitely got me, and the biggest was dealing with context/prompt stack. I found when using the agent on my personal Copilot license it worked perfectly, but when I used my work account I would get failures with "No choices", and this was because my work account has lower context limits. This meant I could send less data to the agent, which was not good considering how small the code base is compared to full applications.
The issue with context limits can be massively impacted by how you build your agent. Incase you didn't know the prompt/context stack is everything you have to send to the LLM. See the LLM has no memory, so you have to send all of the history each time. And with the use of tools you don't actually send one request, but multiple. This can create a huge stack. Below shows this:
As you can see in the above not only was the all the system prompts etc sent, they were sent 5 times (and this is a simple example).
But what is worst is the last message to the LLM, the context stack at the end of these back and forth can get big, very big. As you can see the LLM wants to read files, write the code, and it wants to double check/test it by reading again. And that's how you can run out of context even with small projects.
There are a few ways to fix this (and a few bugs I found in my code).
The first and most obvious is getting your system prompts, instruction.md, and skill.md content down to the minimum. Make sure they are concise, don't duplicate, and are not too verbose.
Next is make sure you send only the right data, this is where one of my bugs was, I was sending all of the skill.md files every time. I tried getting the extension to pick the right files by keywords, but this often meant they were missed. In the end I passed a list/description to the LLM and got it to decide which ones it wanted. The other thing to do is use file trees and keep them tight. File trees allow the LLM to know what files are related and only to send the right ones. You can set how wide they are, but the wider equals more text sent.
The final approach is to deal with the context history. As I said you send all of the previous interactions back each time, but what if you can decrease that. The normal approach is compacting, this is where you ask the LLM to summaries the history and use that instead. It definitely works, but it can mean important context is lossed. The approach I tried was to break up the process and then remove unnecessary information.
I did this by creating a decision log file that is used by the LLM to store a todo list and any key decisions (also useful when returning for new dev session). The LLM then breaks up the project into tasks and completes one by one. When one is complete it updates the todo list and removes all of the tool calls and reasoning, leaving just results.
As with everything there are trade offs, you don't want to remove to much context, especially if most of your users have higher limits (new models keep pushing the boundary of input/output token limits).
Another issue I found was the model, it has such a big impact on performance. Using Opus 4.6/GPT5.4 would deliver great result, but dropping to Auto or GPT4.1 would deliver terrible results. There isn't much I can do here but recommend better models on setup, and look at some documentation how to prompt/structure your build better for older models.
The other issue I got was duplicate CLI installs. My laptop had a old CLI installed directly and a updated version from the Power Platform extension. This would lead the the extension giving the LLM the wrong CLI (the out of date one). This is a niche case, but it does show the risk is in relying on external dependencies like CLI's and VS Code extensions. There isn't much you can do here, but it did drive me to build more into the UI, and getting the LLM to tell the user to use the buttons when it is having issues. The simple truth is code is deterministic, so whenever you can, move the functionality to old fashion code and UI.
And there you have it, if I can create my own AI Coding Agent, anyone can.
If you want to try the agent it you can get it for free here https://marketplace.visualstudio.com/items?itemName=PowerDevBox.codeappjsplus, and if you want to learn more about CodeAppJS or contribute checkout here codeappjs.com




Top comments (1)
Nice writeup. I know it complicates things, but why don't you use RAG as an addition? In Magic Cloud I'm even using RAG for tools and functions. And I can use natural language to create tools too ... ;)