DEV Community

GPT Pilot - a dev tool that writes 95% of coding tasks

zvone187 on September 06, 2023

An MVP for a scalable dev tool that writes production-ready apps from scratch as the developer oversees the implementation In this blog ...

Read full post

Jon Randy 🎖️ • Sep 7 '23 • Edited

Impressive stuff. However...

"Writing code" with GPT or other similar is like going to the gym and having a robot do all the exercises. What do you gain? Very little. It baffles me why any dev would want to work this way? Do you not enjoy the mental stimulation of coding and solving things yourself? Isn't that why you got into this field?

Don't get me wrong, I see value in AI based tools to assist you (autocompleting based on your previous code & project codebase etc.) - I've been using TabNine almost since it existed.

I also see the attraction in building this kind of tool - I get it, it's interesting tech and you're making it do interesting things. I enjoy playing with new toys too.

Using it to carry out 95% of the tasks that I really like doing though? No thanks. That would be job satisfaction and enjoyment straight down the drain.

zvone187 • Sep 7 '23

Hm, yea, I see your point. I definitely enjoy working on this. In terms of why would anyone use this - imagine, you can build a full blown app in 1 week. Imagine, you can ship a new feature every day.

This kind of workflow might be worrying since none of us is used to it but software development will change drastically in the upcoming years. Whether it will be something like GPT Pilot or something else, we'll see, but we definitely won't work like we do today. I think of it like people used to have control over the memory allocation. Today, you almost never think about that. There are memory leaks from time to time but that's so rare that you don't worry about memory anymore.

Jon Randy 🎖️ • Sep 8 '23 • Edited

I think of it like people used to have control over the memory allocation. Today, you almost never think about that. There are memory leaks from time to time but that's so rare that you don't worry about memory anymore.

Unfortunately, it's not rare at all - and the fact that a mentality has developed whereby almost no-one cares about resource usage has gotten software quality and development to the dire place that it is in now. Developers still have control over how they use resources - most are just blissfully unaware of anything in this area, and are oblivious to the fact there are even problems (or that their own actions may be creating said problems)

This kind of generative AI driven workflow is worrying because overall it will only accelerate the decline in quality of upcoming developers, and will essentially turn a lot of us into code reviewers for the generated code (which is often of dubious quality). There will also be a shrinking pool of people who can actually sensibly review this code, due to the increasing shortage of competency created by over-reliance on these technologies.

Overall innovation and original work will also be stifled, as all the good quality, interesting work will be buried under an avalanche of mediocre 'content' (in this case, apps that are 'functional' but likely poorly understood by the people that built them). This phenomenon is visible everywhere (DEV.to being a prime example... there used to be mainly good quality content on here) - and is ruining things in so many areas at an alarming rate.

We're told this is all fine though, because the 'business' guys love this stuff as it is made quickly and cheaply.

I wish I knew how to stop the rot, but it seems like a futile war against $$$

zvone187 • Sep 8 '23

I agree with you in a decline of developers who can create high quality code. I'm not sure how will that play out since GPT Pilot does need a developer to be present. I think that kind of a change is on a long term horizon - 10, 20 years and who knows what will happen by then. Maybe we get to AGI and the whole world turns upside down.

But you do make good points. I guess we'll see what the future holds.

Maarten Nieber • Sep 15 '23

I think a tool such as GPT Pilot changes the game, but it still requires skill. An unskilled person will use GPT Pilot to incrementally build a codebase that becomes harder and harder to understand and maintain. The human in the loop still matters (until we reach the point where it no longer matters, but that might take a while).

winisoft • Sep 11 '23 • Edited

So... machines are things that make work easier. And gyms are places humans go to voluntarily do unproductive work to restructure our tissue as biological organisms to a better condition for more work later. ...Weird analogy you're crafting there.

Why use AI to write software? Because we have many problems that require a lot of inexpensive labor thrown at them to solve. What about that means the work is not a constructive kind because the big metal plates just fall back where they were when you stop applying kinetic energy and relocate them? If we were building a water tower I wouldn't instruct an AI helping such that "and then once you get the water up there just slowly return it to where it originally was and we'll move on to radio towers to end the day so we're fresh for 'suspension bridge day' tomorrow"...?

Apps written through generative AI are not now and never were an intended solution for some overblown "dumbbell average altitude above sea level" crisis, any more than weight lifting was ever a proposed solution to this "help wanted: lifting metric tons of plastic" sign we put up right before the Pacific Ocean ruining the view. But if nobody's going to pick up the garbage for us I guess I can resist the urge to look down on a robot laborer written for this purpose that gets written on my behalf for free? And I'm for sure not ragging on him when he's done for his totally unswolled gains after all that time trying to bulk with like no protein formula in his routine.

Levelleor • Sep 12 '23 • Edited

I believe you got it wrong. The idea was that the AI should not be writing 95% of the code, thus reusing already existing in the internet quality-declining codebase but instead it should fill in the gaps for repetitive and inexpensive work.

That's exactly why gpt, phind, GitHub copilot shine among the developers, because they provide good boilerplates, debugging help and short code-snippets to be reviewed and reused. It's like a stack-overflow replacement. But I'd not trust AI software to take over the whole process, rather chunks of it that can be steadily monitored.

Tushar Dwivedi • Sep 12 '23

More like having a robot to rack the weights, do almost reliable spotting, so that you can focus on the "workout".

Jon Albor • Oct 14 '23

There is more than one type of coffee maker. I like pure espresso coffee, others like decaf, some like Late's and others like light roast. One coffee maker can do all of that, but there are coffee makers that just make one type of coffee.

As engineers we need to both specialize and build coffee makers. If an all in one coffee maker is required, then you build an all in one coffee maker, but if you need to specialize in espresso coffee makers, then you can do that too.

Using AI to build coffee makers still lets me "enjoy the mental stimulation of coding and solving things yourself" because I still have to tell AI what I want my coffee to taste like. I'm still the architect of the end result.

I'm just a lot more efficient building coffee makers.

Dave • Sep 11 '23

Hello, yes I also think it helps here and there, but I have become a final developer Weill I love to do it myself and code

Matija Sosic • Sep 6 '23

This is an amazing idea, thanks for sharing! I've seen quite a bit coding agents, but I think this it the correct approach - LLM collaborating with the developer and asking for help when it gets "stuck". Congrats on the launch!

How long did it take to build it, what was the biggest challenge?

zvone187 • Sep 6 '23

Thanks @matijasos! Yea, I also think that this approach should yield a big productivity boost for devs. It took us 3 weeks for this mvp.

The biggest challenge is just tuning (not fine tuning) of prompts and point GPT-4 in the right direction. I think that this is a biggest work that needs to be done going forward - figure out what prompts work the best.

Andre Du Plessis • Sep 8 '23 • Edited

I guess the prompts one feeds to the LLM's will always be the greatest challenge. We struggle as humans to communicate explicitly with eachother, so trying to tell a LLM exactly what we want will be as big, if not bigger challenge, especially if we start pushing the boundaries a bit. It does make for very interesting and exciting times though. And the sooner we can learn how to apply this tech responsibly and efficiently, the sooner we as a human species would be able to stop looking and fighting "inward" and start turining around as a collective to go out and explore outward.

Watching The Next Decade of Software Development - Richard Campbell - NDC London 2023, I watched with great interest as he demonstrated in an interesting way where we started off about what 40-50 years ago, and pointed out that we are at the end of "Silicon-Street". Soon we wont have atoms left to make smaller transistors with.

And while quantum computing is its still in its very infancy today (comparing it to the first computer memory and processing power capabilities), I am positive that "tomorrow" we will soon be set on a similar path as far as growth of knoledge and technology is concerned when quantum computing truly kicks in.

There's a whole galaxy out there that can be discovered and explored over the next few centuries, and then next it will be our known universe. It's time we stop our little "pre-school tantrums", and look at "primary school". There's so much to learn and gain, vs loosing a stupid classroom fight. Imagine where we can be once we reach some modicum of "adulthood" in our approach to being human?

To take 3D printing next-level from greating all kinds of stuff from raw materials, imagine what we can do when we can start printing with base element atoms? What about Warp engines. The micro experiments scientists have been working on just last year makes the dream of going to WARP 9.5 far more attainable if we harness our energies and focus it on teaching our creations to do some hard thinking at "double quantum-time"! LOL

Maxim Saplin • Sep 7 '23

Looks promising, great work! It's just 1h ago I've written that AI tools are good at small/simple stuff and here we go :)

Would be curious to see how this project evolves. What kind of tooling/IDE integrations there can be, how can it fit in existing workflows of teams that maintain codebases etxc

zvone187 • Sep 7 '23

Yea, great question! We've actually created a VS Code extension which should launch soon - maybe even next week. The way that I see this being used is definitely within the IDE (likely an extension) so that it can assist you when you need to intervene and change some code (which the developer will definitely need to do from time to time).

Maxim Saplin • Sep 7 '23

Looking forward to try the VSCode extension! Btw, in May I've piblished one of mybown, also AI coding assistant: marketplace.visualstudio.com/items...

zvone187 • Sep 7 '23

Oh, awesome, will check it out. Congrats on shipping and 4k downloads!!

Nikolas • Sep 9 '23

Very interesting read! It looks like we're on a very similar path. I would love to hear your thoughts on Codebuddy here: codebuddy.ca

I seem to have come to the same conclusion you have about the need for human oversight at every step. This is a working prototype that tries to do what you've done here, also with an array of agents but at a slightly lower level (though much higher than GitHub Copilot).

JetBrains plug-in will be ready soon too! The under served!

zvone187 • Sep 9 '23

Oh nice!! Congrats on the project. I tried to sign up but it seems to request a lot of permissions from my Github so didn't feel comfortable giving away so much. Did you think to ease that up for users so they can try it out more easily?

I assume you need all those permissions for some features, but it might be good to request just the basic perms so people can see what is Code Buddy about (like I wanted to see) and then, you request more permissions once they want to try those advanced features.

Btw, why not open source the project?

Nikolas • Sep 9 '23

Thanks!
As a matter of fact, I went to work implementing this as a result of your comment and I think it's ready! Now you can login using "limited access" and specify a fine-grained token with whatever permissions you want.

We also only really need those permissions for the website and the JetBrains plugin is nearly ready for release so... even more reason to have a reduced permissions option!

Freddy Hidalgo-Monchez • Sep 7 '23 • Edited

Impressive stuff. I know a lot of people won't agree with me, but I think this is where development is inevitably going. The time put into churning out code is huge, and businesses are not going to do that if there's a faster, safer alternative that produces the same or better results. We've seen this thousands of time over.

I think as engineers we have a duty to embrace this innovation. There will always be room for humans to guide and fix the machine. It might not be with the interface or tools we have now, but our roles as tech stewards will not vanish unless we refuse to grow.

Kudos for this initiative! I've starred it and look forward to testing it out :)

zvone187 • Sep 7 '23

Thank you so much man for the encouraging words 🙏 🙏

Yea, TBH, I do think that as well. Whether it will be GPT Pilot or something else, we'll see, but I think that a dev's workflow will completely change in the upcoming years.
As you said, we should be embracing innovation and LLMs as a tech opens up so much opportunity to innovate. And I don't think that anyone will lose their job if they adapt to the new tech that will come out.

Andre Du Plessis • Sep 8 '23

Hi, Zvone. Great and very interesting work!

Your work looks extremely promising as you go through explaining how you and your team applied GPT to assist in developing scalable software. May it do well and provoke more insight for you all as well as future users and contributors.

I’m excited to see that TDD is part of the implementation. What I would like to know (I know you are working on designing your own Test platform “Pythagoras” for Node.JS apps) is why not consider having a more generic and decoupled TDD approach? Decoupling all your tests; E2E, Application tests, and Unit tests from any specific Testing frameworks, libraries, etc.? This will create a wider adoption and flexibility for developers wanting to use GPT Pilot via any Testing Frameworks and libraries they prefer.

Another very valuable addition to consider would be implementing a DDD approach forming part of your “User-Story” phase. Significant work has been done in that field too, and I’m sure you folks could benefit from it.

Working a solid DDD approach into GPT Pilot would take this project to a tier not yet publicly available, at least to my knowledge. The thought patterns and tech are already available to enable you to do so, though. Refer to: Clair Mary Sebastian’s article on Medium (Enhancing Domain-Driven Design with Generative AI — 2023) and George Lawton’s on TechTarget (How deep learning and AI techniques accelerate domain-driven design — 2017.) There are additional articles with truly inspiring concepts on related topics on Inspiring Brilliance and TechTarget’s sites as well.

If you can include sound and “flexible” foundations for both DDD and decoupled TDD SW dev approaches, this will push endeavours like yours forward to build truly great systems guided by human developers (both business devs and software devs), capturing two key points in the industry: Design software based on Business models that work, and test the software fully in many, if not all aspects before deploying a single line. That would be a winner!

May your project go very well!

zvone187 • Sep 9 '23

Thank you @andre_adpc 🙏 Yes, I think TDD will be a big part of GPT Pilot - we just need to catch time to implement it correctly, and more importantly, to research what's the best way LLMs work with TDD. I actually started with TDD but realized that it's not trivial to get LLM to be more efficient with it.

I didn't know about DDD, looks interesting. I think that to make GPT Pilot work really well, we'll need to test many of these concepts to see which one works best with LLMs. Same as with TDD.

Andre Du Plessis • Sep 11 '23

If you need some good inspiration and insight on fully decoupling your testing I recommend you have a look at Markus Oberlehner's book. For me personally he had a great way of explaining the current three main automated tests in human language. Switching that language into actual tests makes so much better sense as he progresses. It might help you guys write the prompts and develop the LLMs .

Shachar Har-Shuv • Sep 8 '23

Can this be used in an existing projects? Otherwise adoption is going to be difficult. (It's not every day that you start a new project)

zvone187 • Sep 8 '23

Good question - not at the moment. Currently, this is a research to see how many of coding tasks can be done by AI. Once GPT Pilot works completely as described in this post, the idea is to create a tool that map out an existing project by working with a developer (eg. in 1-2 days). Once a project is mapped, GPT Pilot can continue developing on it.

Shachar Har-Shuv • Sep 8 '23

That's very interesting. Looking forward to this. The idea of GPT writing PRs for me to review was intruiging from the very beginning

Leowhyx • Sep 7 '23

Intresting post

zvone187 • Sep 7 '23

Thanks @leowhyx 🙏

otisgbangba • Sep 8 '23

Good job. Keep it flying.

zvone187 • Sep 8 '23

Thanks 🙏 🙏

Upvotebeast • Sep 6 '23

If someone crafts an HTML/CSS design automation tool, they're set for success!

Casandra • Sep 6 '23

I've bookmarked your repo for later.

zvone187 • Sep 6 '23

Thanks 🙏

Sattineez_SSC • Sep 6 '23

Thank you for sharing ! I would be awesome if it can explain step-by-step the code, as a student and also programmer it will be a must to learn and stay up to date ❤️

Femundw4 • Sep 6 '23

This is captivating! The potential of GPT-4 for developers is vast. Eager to see your demo. Keep pushing boundaries! 🌟

zvone187 • Sep 6 '23

Thanks

Christopher McClellan • Sep 8 '23

Are we not going to talk about how we need to solve the causality problem to make these kinds of tools useful? A next token generator is a useful thing indeed, and will be necessary for an AI to actually write software, but without “understanding” causality these tools will continue to produce mediocre, at best, results.

tnypxl • Sep 10 '23

Most projects are mediocre at the start. I don’t think the purpose (for the millionth time) is to replace the developers or the full software delivery process with generative AI agents. The point is to spend less cognitive effort rebuilding the same basic foundation every single time.

I’d rather spend time on the actual hard problems of the project.

Mike Ritchie • Sep 23 '23 • Edited

I love the fact that you're actually applying a workflow to this, with a product manager, tech/IT lead, etc. If you could also give it samples of your previous code so that it could learn on that for the programming side, that would be amazing. The only thing I don't see being solvable is the front-end. UX design can not be generalized unless you want to end up with yet another anonymous Bootstrap app.

I'll definitely be following the rest of the articles in this!

Savy • Sep 18 '23 • Edited

I wanted to share my experience like what you did. and also share my python code for anyone who loves to test it.
on the other side, I was thinking that the "code" or "coding" is not our problem.
because we can use wordpress or thousands of headless CMSs or codeless tools now. or even we can search and implement open source projects. creating a five in row game with AI is like fun research.

In more serious and important businesses, it's common that the customers, managers or CEOs not to know exactly what they want. they require time and step-by-step progress, so we have to code at their pace.

In the past 12 years that I've been working, there were times when I deployed features so fast that customers, CTOs or CEOs didn't have enough time to test it. many times, they got offended and threatened because they thought that I might make decisions independently without respecting their decision-making.

So releasing new features everyday is so fast and can lead to problems. also big applications need a little stability and this stability is the main source of the trust.

One of the biggest problems of programmers and today world is junk data and junk information.
So, maybe one of the most useful aspects of using AI would be the ability to search and find the needed part of data in the current jungle of shit code.

I think biggest problem of GPT is its memory, at least for now.
8k - 64k words memory isn't enough for big projects, unless we find a way to decouple whole project into smaller parts.

About the future, I totally agree with you. it will change the way we code.
let's see what happens in the future.
I'm sure it won't be boring at least, and even could be beneficial for programmers.

George Johnson • Oct 5 '23

I think this is very important work to do as we need to keep pushing back boundaries and see what technology can do for us, it's why we're all in this business. I hope to see some interesting results once more people start to play with it, "Never underestimate the ingenuity of a user.".

However AI if fraught with moral landmines, morality and ethics are fast being pushed aside in the race to utilise and maximise AI's capabilities as they evolve. We can't stand back though, if we do then someone else will jump in ahead of us. Everyone is wanting to talking about "AI and ethics" but no one dare as it's a fast evolving tech and even an hour spent considing "we simply did 'cos we could, we didn't think if we should" is time lost in the race.

We come to legalities. If I use your AI code genrator app, it writes code and put it to work in a manufactoring plant and 2 people lose limbs. Will you indemnify me? Can I sue you on their behalf? You said the generator was sound and the code would be fine but it failed at a critical path. Are you responsible as it was your app that interpreted my prompts? Do I sue the AI provider underpinning your app? Knotty questions for sure!

As humans we love challenges, we actually enjoy the struggles and making things too easy simply makes people think if they should even bother anymore. Will AI and coding simply kill coding, not entirely but if all coding is to become ( and we all know it will one day ) simply a matter of "ask and ye shall recieve in abundance" then I think we will lose one of the great mental challenges humans enjoy on a daily basis.

I guess my other major concern would be the quality of the souce material. We've all seen the memes that coding in the 2020s is simply a matter of having access to StackOverflow but that brings it's own host of issues. If apps and code are just thrown together that's fine for testbeds but full production apps and code needs rigourous testing to ensure it's safe, especially if code is to be put to work int he real world where lives could be at stake on the outcome of AI generated code. The AI will not care if the code fails and 50 people die, it has no guilt or conscience.

I still think this is valuable work, it's important this is explored, documented publicly as you have and I'm grateful that you have gone public rather than this be lurking in some backroom and suddenly appears, it gives time to pause ( not too long! ) and consider lots of angles and discuss.

Levelleor • Sep 12 '23

The examples of work done by this software displays some very simple applications. Would it ever be able to do any more complex work? I feel like the use-cases for such software are quite limited. No enterprise would want to ever replace their entire senior squad by chatgpt to develop an alarm, rather they would fill-in the blanks for repetitive tasks with GitHub copilot or similar.

This could be useful for more casual people, who don't fully understand the whole process, but since this software relies on expertise of a developer to review code it doesn't look like there are any real use-cases to bring anything production-ready with such tool.

And I agree with the topics other folks brought up. The existence of such tool lowers the long-term quality of end-products and number of experienced candidates. Someone would always be required to oversee the process and in the next couple of years with such a process in place we'll just run out of seniors, because no juniors are longer needed to do the work.