DEV Community

Cover image for Becoming augmented by AI
David Pereira
David Pereira

Posted on • Originally published at blogit.create.pt

Becoming augmented by AI

Table of contents

We're deep into Co-Intelligence in Create IT's book club — definitely worth your time! Between that and the endless stream of LLM content online, I've been in full research mode. Still, I can't just watch and hear others talk about these tools, I must experiment myself and learn how to use them for my use cases.

Software development is complex. My job isn't just churning out code, but there are many concepts in this book that we've internalized and started adopting.
In this post, I'll share my opinions and some of the practical guidelines our team has been following to be augmented by AI.

The "Jagged Frontier" concept

The Jagged Frontier described by the author Ethan Mollick is an amazing concept in my opinion. It's where tasks that appear to be of similar difficulty may either be performed better or worse by humans using AI. Due to the “jagged” nature of the frontier, the same knowledge workflow of tasks can have tasks on both sides of the frontier according to a publication where the author took part.

This leads to the Centaur vs. Cyborg distinction which is really interesting. Using both approaches (deeply integrated collaboration and separation of tasks) seems to be the goal to achieve co-intelligence. One very important Cyborg practice seen in that publication is "push-back" and "demanding logic explanation", meaning we disagree with the AI output, give it feedback, and ask it to reconsider and explain better. Or as I often do, ask it to double-check with official documentation that what it's telling me is correct.
It's also important to understand that this frontier can change as these models improve. Hence, the focus on experimentation to understand where the Jagged Frontier lies in each LLM. It's definitely knowledge that everyone in the industry right now wants to acquire (maybe share it afterwards 😅).

Becoming augmented by AI

I'm aware of the marketed productivity gains, where GitHub Copilot usage makes devs 55% faster, and other studies that have been posted about GenAI increasing productivity. I'm also aware of the studies claiming the opposite 😄 like the METR study showing AI makes devs 19% slower. However, I don't see 55% productivity gains for myself, and I don't think it makes me slower either.

In my opinion, productivity gains aren't measured by producing more code. Number of PRs? Nope. Acceptance rate for AI suggestions? Definitely not! I firmly believe the less code, the better. The less slop the better too 😄. I'm currently focused on assessing DORA metrics and others for my team, because we want to measure how AI-assisted coding and the other ways we use it as an augmentation tool, actually improves those metrics, or make them worse. The rest of marketing and hype doesn't matter.

AI as a co-worker

For a tech lead that works with Azure services, an important skill is to know how to leverage the correct Azure services to build, deploy, and manage a scalable solution. So it becomes very useful to have an AI partner that can have a conversation about this, for example about Azure Durable Functions. This conversation can be shallow, and not get all the implementation details 100% correct. That's okay, because the tech lead (and any dev 😅) also needs to exhibit critical thinking and evaluate the AI responses. This is not a skill we want to delegate to these models, at least in my opinion and in the author's opinion. There is a relevant research paper about this by Microsoft as well.

The goal can simply be to have a conversation with a co-worker to spark some new ideas or possible solutions that we haven't thought of. Using AI for ideation is a great use case, not just for engineering, but for product features too like UI/UX, important metrics to capture, etc. If it generates 20 ideas, there is a higher chance you find the bad ones, filter them out, and clear your mind or steer it into better ideas. Here is an example to get some ideas on fixing a recurring exception:
claude-to-get-ideas

It asks clarifying questions so that I can give it more useful context. Then I can see the response, iterate, or ask for more ideas, etc. I usually always set these instructions for any LLM:

Ask clarifying questions before giving an answer. Keep explanations not too long. Try to be as insightful as possible, and remember to verify if a solution can be implemented when answering about Azure and architecture in general.
It's also very important for you to verify if there is official documentation that supports your claims and statements. Please find official documentation supporting your claims, before responding to a user. If there isn't documentation confirming your statement, don't include it in the response.
Enter fullscreen mode Exit fullscreen mode

That is also why it searches for docs. I've gotten way too many statements in the LLM's response that when I follow-up on, it realizes it made an error, or assumption, etc. When I ask it further about that sentence that it just gave me, I just get "You're right - I was wrong about that"... Don't become too over-reliant on these tools 😅.

AI as a co-teacher

With that said, the tech lead and senior devs are also responsible for upskilling their team by sharing knowledge, best practices, challenging juniors with more complex tasks, etc. And this part of the job isn't that simple; it's hard to be a force multiplier that improves everyone around you. So, what if the tech lead could use AI in this way, by creating reusable prompts, documentation, and custom agents? How about the tech lead uses AI as a co-teacher, and then shares how to do it with the rest of the team? All of these are then able to help juniors be onboarded, help them understand our codebase and our domain. Claude Code Best practices post also reference onboarding as a good use case that helps Anthropic engineers.

At Anthropic, using Claude Code in this way has become our core onboarding workflow, significantly improving ramp-up time and reducing load on other engineers.

A lot of onboarding time is spent on understanding the business logic and then how it's implemented. For juniors, it's also about the design patterns or codebase structure. So I really think this is a net-positive for the whole team.

My augmentation list

It might not be much, but these are essentially the tasks I'm augmented by AI:

Technical:

  • Initial code review (e.g. nitpicks, typos), some stuff I should really just automate 😅
  • Generate summaries for the PR description
  • Architectural discussions, including trade-off and risk analysis
    • Draft an ADR (Architecture decision record) based on my analysis and arguments
  • Co-Teacher and Co-Worker
    • "Deep Research" and discussion about possible solutions
    • Learn new tech with analogies or specific Azure features
    • Find new sources of information (e.g. blog posts, official docs, conference talks)
  • Troubleshooting for specific infrastructure problems
    • Generating KQL queries (e.g. rendering charts, analyzing traces & exceptions & dependencies)
  • Refactoring and documentation suggestions
  • Generation of new unit tests given X scenarios

Non-technical

  • Summarizing book chapters/blog posts or videos (e.g. NotebookLM)
  • Role play in various scenarios (e.g. book discussions)

Of course, we also need to talk about the tasks that fall outside the Jagged Frontier. Again, these can vary from person to person. From my usage and experiments so far, these are the tasks that currently fall outside the frontier:

  • Being responsible for technical support tickets, where a customer encountered an error or has a question about our product. This involves answering the ticket, asking clarifying questions when necessary, opening up tickets on a 3rd party that are related to this issue, and then resolving the issue.
  • Deep valuable code review. This includes good insights, suggestions, and knowledge sharing to improve the PR author's skills. CodeRabbit does often give valuable code reviews, way better than any other solution. Still not the same as human review 🙂
  • Development of a v0 (or draft) for new complex features
  • Fixing bugs that require business domain knowledge

Delegating some of those tasks would be cool, at least 50% 😄, while our engineering team focuses on other tasks. But oh well, maybe that day will come.

AI-assisted coding

AI-assisted coding can be very helpful on some tasks, and lately my goal is to increase the number of tasks AI can assist me. In our team, we've read Claude Code Best practices in order to learn and see what fits best for our use case. Then we dive deeper in some topics that post references, for example these docs were very useful to learn about Claude's extended thinking feature, complementing the usage of "think" < "think hard" < "think harder" < "ultrathink". We also found this post by Simon about this entire feature that was interesting.
In most tasks, using an iterative approach, just like normal software development, is indeed way better than one-shot with the perfect prompt. Still, if it takes too many iterations, like some bugfixes were too complex because it's hard to pinpoint the location of the bug, then it loses performance and overall becomes bad (infinite load spinner of death 🤣).

Before we can use AI-assisted coding on more complex tasks, we need to improve the output quality. So we've invested a lot of time in fine-tuning custom instructions and meta-prompting. Let's talk about these two.

Custom instructions

According to Copilot docs, instructions should be short, self-contained statements. Most principles in prompt engineering are about being short, specific, and making sure our critical instructions is something the model takes special attention to.
Like everyone talks about, the context window is very important, so it's really good if we can just have an instruction file of 200 lines. The longer our instructions are, the greater the risk that the LLM won't follow them, since it can pay more attention to other tokens or forget relevant instructions. With that said, keeping instructions short is also a challenge when we use the few-shot prompting technique and add more examples.

To build our custom instructions, we used C# and Blazor files from the awesome-copilot repo and other sources of inspiration like parahelp prompt design to get a first version. We wanted to know what techniques other teams use. Then we made specific edits to follow our own guidelines and removed rules specific to explaining concepts, etc.
We also added some capitalized words that are common in system prompts or commands, like IMPORTANT, NEVER, ALWAYS, MUST. The IMPORTANT word is also at the end of the instruction, to try and refocus the attention to coding standards:

IMPORTANT: Follow our coding standards when implementing features or fixing bugs. If you are unsure about a specific coding standard, ask for clarification.
Enter fullscreen mode Exit fullscreen mode

I'm not 100% sure how this capitalization works, or why it works... and I have not found docs/evidence/research on this. All I know is that capitalized words have different tokens than lowercase. It's probably something the model pays more attention to, since in the training data, when we use these words, it means it's important. I do wish Microsoft, OpenAI, and Anthropic included this topic on capitalization in their prompt engineering docs/tutorials.

It's at the end of our file since it's also being researched that the beginning and end of a prompt are what the LLM pays more attention to and finds more relevant. Some middle parts are "meh" and can be forgotten. Microsoft docs say the same essentially, it's known as "recency bias". In most prompts we see, this section exists at the end to refocus the LLM's attention.

Meta-prompting

Our goal also isn't to have the perfect custom instructions and prompt, since refining it later with an iterative/conversational approach works well. But we came across the concept of meta-prompting, a term that is becoming more popular. Basically, we asked Claude how to improve our prompt, and it gave us some cool ideas to improve our instructions/reusable prompts.

But don't forget to use LLMs with caution... I keep getting "You're absolutely right..." and it's annoying how sycophantic it is oftentimes 😅

llm-trust-but-verify

The quality of the output is most likely affected by the complexity of the task I'm working on too. Prompting skills only go so far, from what I've researched and learned so far, I can say there is a learning curve for understanding LLMs. So we need to continue experimenting and learning the layers between our prompt and the output we see.

Resources

This is not an exhaustive list by any means, just some resources I find very useful:

Conclusion

I've enjoyed learning and improving myself over the years. But with GenAI I now feel like I could learn a lot more and improve myself even further since I'm choosing them as augmentation tools.
Hopefully, this article motivates you to pursue AI augmentation for yourself. It's okay to be skeptical about all the hype you watch and hear around these tools. It's a good mechanism to not fall for all the sales pitches and fluff CEO's and others in the industry talk about. Just don't let your skepticism prevent you from learning, experimenting, building your own opinion, and finding ways of improving your work 🙂.

Still... I can't deny my curiosity to know more about how these systems work underneath. How is fine-tuning done exactly? How does post-training work? Can these models emit telemetry (logs, traces, metrics) that we can observe? Why does capitalization (e.g. IMPORTANT, MUST) or setting a role/persona improve prompts? Can we really not have access to a high-level tree with the weights the LLM uses to correlate tokens, and use it to justify why a given output was produced? Or why an instruction given as input was not followed?
It's okay to just have a basic understanding and know about the new abstractions we have with these LLMs. But knowing how that abstraction works leads to knowing how to transition to automation.

I will keep searching and learning more in order to answer these questions or find engineers in the industry who have answered them. Especially around interpretability research, which is amazing!!! I recommend reading this research, for example - Tracing the thoughts of a large language model.
Hope you enjoyed reading, feel free to share in the comments below how you use AI to augment yourself 🙂.

Top comments (0)