Looking inside GPT-Synthesizer and the idea of LLM-based code generation

#ai #openai #opensource #programming

GPT-Synthesizer

GPT-Synthesizer is an open source tool that uses GPT for software generation. In this post, instead of talking about releases and features, I want to dive deep into how GPT-synthesizer works under the hood and explain some high level ideas behind this project. Further, I want to discuss the strengths and weaknesses of LLM-based code generation tools, and speculate on how they will evolve in future.

Are LLMs good for code generation?

Nowadays everybody is using LLMs (Large Language Models) for everything and that’s for a good reason; they are the shiny new technology and they are extremely powerful tools. We are all excited to explore where and how we can use them, but that doesn’t mean that they are the best tools to get the job done in each and every case. LLMs are made for interaction through human language, and that’s where they really shine. Take chat-gpt as an example, where both the inputs and outputs are in human language. In code generation, on the other hand, the generated code isn’t in natural language. It’s in Python, C, or programming languages, with well-defined syntax and rigid semantics. All programming languages were made for the human programmers to describe their intent to the machine in a clear and deterministically-interpretable format.

Since software isn’t written in human language, why should we use LLMs for software generation? To answer this, we should recognize that there are two sides to software generation: (1) the input: capturing the spec, (2) the output: generating the code.

The generated code isn’t in human language, but the input spec is. LLMs aren’t the best tools for code generation, but they are amazing at understanding the intent. That’s where they shine, and that’s where the focus of their application should be. In GPT-synthesizer the main focus is on understanding what exactly the user wants to do. The code generation itself is the smaller piece of the puzzle, and isn’t the main focus.
This doesn’t mean that LLMs are necessarily bad at code generation. LLMs such at GPT4 are so powerful that they can do a decent job of it. With throwing so much raw power at it, LLMs can basically solve the problem by brute force. However, the code generation is not the strength of the LLMs or LLM-based software generation tools. The strength comes in communicating through the medium of natural language to capture the spec. This is where the focus of any LLM-based software generator should be, and this is where we put our thoughts and efforts when we made GPT-synthesizer. So let’s take a deeper look into how GPT-Synthesizer actually works.

How GPT-Synthesizer works

The process of software generation in GPT-synthesizer can be explained in three steps:

Component synthesis
Component specification & generation
Top-level generation

Component synthesis:

First, GPT-synthesizer reads the given programming task provided by the user in the initial prompt, and breaks it into software components that need to be implemented. We call this step component synthesis. Then, GPT-Synthesizer shows the user the compiled list of components along with their descriptions, and asks the user to finalize the list by adding/removing any component to/from the list. The idea here is to keep the user in the driver’s seat by asking for his confirmation.

Ultimately, it is not the tool that invents the software; it is the user utilizing the tool who is in charge of the project. Figure 1 shows how GPT-synthesizer identifies a list of components in component synthesis.

Figure 1. Component synthesis

Component specification & generation:

For every component identified and finalized in the previous step, GPT-synthesizer captures the intent from the user; only when the intent is completely clear, it implements that component. The task of capturing the intent involves an elaborate process of prompt engineering that we call prompt synthesis. This is the heart of GPT-synthesizer where the LLM’s strong suit is used in processing conversations and generating questions all in natural language.

Figure 2 shows the process of prompt synthesis in which GPT-synthesizer uses a summary of the chat history plus the top-level information about the task, the output language, and the software component to generate a prompt that will be fed to the LLM to create a follow-up question. This process will continue in a loop until the spec is clear and the user has provided the necessary details about the design.

The idea here is not just to keep human in the loop, but to keep him in the driver’s seat. We want the user to make decisions on the details of the design. We made GPT-synthesizer as a programming assistant tool that can be used in the early stages of the software design to create a draft (a blueprint) of the software project. GPT-synthesizer explores the design space and identifies the unknowns; it holds the user’s hand as it walks though the design space, sheds light on the design unknowns, brings them to the user’s attention, provides suggestions on those details, and asks the user for clarification and confirmation on design details.

For a less-experienced user, who wants to write a software but doesn’t know where to start, or what goes into writing such software, GPT-synthesizer could be like a coach; someone that turns the unknown unknowns into known unknown.

Finally, when the component spec is clear, and all the design details are resolved, GPT-synthesizer generates the code for that component. Figure 3 illustrates the component generation step.

Figure 2. Component specification using prompt synthesis

Figure 3. Component generation

Top-level generation:

At the end, GPT-synthesizer creates the top/main function which will act as the entry point for the software. As of now, this step is only supported for python.

By now, you can see that the heart of GPT-synthesizer is not the code generation, but rather the component synthesis and prompt synthesis; GPT-synthesizer’s strength is in capturing the specification through a conversation in natural language where the LLMs are at their best.

Lessons we learned from GPT-synthesizer

The following remarks summarize the lessons we learned from development of GPT-synthesizer:

The strength of LLM-based software generation tools are in capturing the spec, and the spec cannot be captured efficiently in a single prompt.
Human should remain in the driver’s seat and control the design process.
A good prompt engineering is key to capture design details from user, and the LLM’s output is only as good as its prompts.

Now, I would like to step aside from GPT-synthesizer for a bit, and speculate on what I think is the future for programming languages in the presence of LLMs.

The future of programming languages

Programming languages are the relics of a past in which machines couldn’t understand the human language with its complex, irregular, and ambiguous structures. That has changed now. For the first time ever, in computer history, computers can understand us just the way we speak, and there is no need for us to speak to them in their language.

So what will happens to programming languages then? Are they gonna vanish completely? I believe it would takes years, maybe even decades, for programming languages to gradually phase out and be replaced by human language. It’s a matter of the quality of the generated code, the power efficiency of the LLM tools, and the legacy of existing softwares written in programing languages. Eventually these matters sort themselves out, and natural languages will become the only interface between humans and machines, and the programming languages will only remain as intermediate formats inside the tools.

When computers first came out, we had to talk to them in 0s and 1s which then was replaced by the assembly language. Later, we took one step farther from the machine language and described our intent in higher-level languages like C, Pascal, etc., and relied on compilers to translate our intent into the machine language.

For some time, if you wanted your software to run efficiently, you had to manually modify the compiler-generated assembly code, or to skip the compiler altogether and write your assembly manually. Overtime as compilers got better, smarter, and more optimized, the generated assembly got better and better. At the same time, with transistor scaling as well as innovations in computer architecture, the processors became more powerful; therefore the lack of efficiency of the auto-generated assembly became less of an issue. Meanwhile, the advancements in chip design and manufacturing technologies improved the capacity and speed of both on-chip and off-chip memories, allowing programmers to be more lenient with the size of the generate assembly. Eventually, the combination of these advancements shifted the balance from having the most optimized hand-written assembly code to saving development time and effort by trusting compilers.

With the success of the programming languages and compilers, we took more steps away from machine language, and used even higher-abstraction-level languages like Python or Matlab to communicate to machines. Now, with the invention of LLMs, we are taking one last step and completely switch to our own language to interface with the machines.

I expect the same scenario to play out regarding trusting LLMs with our code generation. Overtime, LLMs will become more powerful, more efficient, and better integrated with current ecosystems to generate better softwares. At the same time, the processing power as well as the data capacity of the cloud services will grow, and the communication speed will improve, driving down the cost per unit, allowing more forgiveness on the efficiency of the LLM process and the quality of the generated code. It could take several years, but I believe we gradually take our hands off of the programming languages and trust language models to handle them.

I don’t expect programming languages to vanish completely. I think they will exist as an intermediate format the same way that the assembly language exists today. I would also predict that there will be a lot of consolidations in that space and only few languages will survive this transition. The traditional compilers and many other legacy softwares can coexist behind the scene and work under LLMs command.

It is somewhat easier to think of LLMs not as AI programs, but rather as human experts who can understand our requirements in human language, and utilize other tools such as legacy softwares (e.g, compilers, synthesizers, convertors, traditional AI tools) to get the job done.

These are my opinions and speculations regarding the future of LLMs. I am curious to learn about your thoughts on this matter. Please feel free to comment on that.

About GPT-Synthesizer

We made GPT-Synthesizer open source hoping that it would benefit others who are interested in this domain. We encourage all of you to check out this tool, and give us your feedback here, or by filing issues on our GitHub. If you like GPT-Synthesizer or the ideas behind it, please star our repository to give it more recognition. We plan to keep maintaining and updating this tool, and we welcome all of you to participate in this open source project.

About RoboCoach

We are a small early-stage startup company based in San Diego, California. We are exploring the applications of LLMs in software generation as well as some other domains. GPT-synthesizer is our general-purpose code generator. We have another open source product for special-purpose code generation in robotics domain, which is called ROScribe. You can learn more about these tools in our Github.