DEV Community

Cover image for I Taught OpenAI a New Programming Language - Fine Tuning
Thomas Hansen
Thomas Hansen

Posted on

I Taught OpenAI a New Programming Language - Fine Tuning

It took a month, and we had to create 80,000 lines of "throw away code", but we pulled it through. We taught OpenAI's LLM models a new programming language by using fine tuning, and it's scoring at 80 to 90 percent accuracy.

Fine tuning allows you to teach an LLM new things. For 99% of all practical use cases it's completely useless, and you're simply much better of using RAG and VSS. However, for some jobs it's the only way to achieve your goal.

The problem with fine tuning is that it's ridiculously resources intensive. You have to have thousands of examples of facts to seed the LLM with. To teach GPT-40-mini a new programming language for instance, you'll need the following training data.

  • 3,500 snippets, where we are now, will give you 80% accuracy
  • 5,000 snippets, where we'll be in a month, will give you 95% accuracy
  • 10,000 snippets, where we'll hopefully be in a year, will give you 99% accuracy

I had about 300 files from before. These were the system files for the platform itself. In addition, I was able to use ChatGPT to generate "variations" of my examples. This allowed me to create one example snippet and tell ChatGPT to create multiple variations of it. The third trick we applied was to expose our documentation as RAG to a custom GPT, allowing us to prompt it as follows.

Use your action to search for xyz and generate 5 different examples

This allowed us to further generate a lot of Hyperlambda examples to fine tune.

However, we're now at the point where we can generate functioning backend and API code literally using natural language, something you can see in the above screenshot. If you want to try it out, you can test it at our documentation website by stating for instance ...

"Generate Hyperlambda code that wraps an endpoint sending emails and saving these emails into my crm and emails database / table having a content column"

... and the thing will actually do just that!

Top comments (0)