Marcio Frayze

Posted on Jun 15, 2023 • Edited on Jun 19, 2023

What it was like to spend a month using GitHub Copilot and why I plan to not use it (next month)

#githubcopilot #ai #coding #productivity

You're probably already familiar with GitHub Copilot: a tool created by Microsoft to help developers by providing real-time code suggestions. While tools like ChatGPT are also capable of generating code, GitHub Copilot takes advantage of tight integration with the code editor.

There are several ways to use it: you can write a feature as a comment and the tool is capable of generating a function, method, or even a complete class to contemplate what you described. Or you can type the name of a function or method and let the autocomplete try to predict the corresponding code. Or you can just use it as a slightly "smarter" autocomplete for smaller parts of your code.

The first way (using only comments) I don't trust and would never recommend. The second, writing just the name of the function or method, usually works surprisingly well for snippets of code with reduced and closed scope, such as creating a function that will receive a data structure and do some simple transformation or filter operations. The last way, as a more "efficient" autocomplete, had mixed results. Sometimes it hit the jackpot! 🎯 But it wasn't uncommon to need adjustments or even come up with a solution that was just plain wrong.

Throughout May 2023, I used this tool to program in Clojure and Elm. In this article, I tell you a little about what I thought of this experience.

Why I decided to give it a try

In April 2023 Kent Beck posted the following message on Twitter:

This twit was so successful that he ended up writing a post to better explain his opinion.

For those who don't know him, Kent Beck is one of my favorite technical book authors, having written such books as eXtreme Programming Explained, Planning Extreme Programming , Test-Driven Development: By Example, among others.

In April 2023 I was also reluctant to try ChatGPT, but after reading this twit, I immediately opened it and started testing it. It soon became clear that I was facing something very different from what I had experienced before. I was very impressed and spent hours doing various experiments, including code generation.

After a day of testing ChatGPT, one thing stuck in my head: if it manages to generate codes so well when it wasn't even created for this purpose, imagine what it must be like to use GitHub Copilot!

To my joy, I remembered that the company I work for was selecting a group of people to test this tool in the following month, in partnership with Microsoft. I ran and managed to include myself in this team. Couldn't wait to get started! 🎉

First impressions

Initially, I didn't know (and, I confess, I still don't know) exactly how to use this technology. As I reported at the beginning of the article, there is no formula, and each person can adapt to their way of working. But it was very impressive. I started small, writing functions in Clojure (backend) and Elm (frontend) in well-defined contexts, and it felt like it could read my mind! I wrote the name of the function and... the implementation appeared! 🧙🪄

Looked like facing a great revolution and that the way I would program would never be the same. I felt more productive. As if I didn't waste more time with boilerplates and repetitive codes in general. Now I could focus on what matters!

But little by little I noticed some not-so-cool things, both in the generated code and in my behavior.

Confusing and non-standard codes

I used Copilot in a web application that had been in production for over 6 months, with many screens and services that already had a clear design pattern. And while Copilot can adapt, there are limitations. Suddenly I began to notice that the construction of some functions was quite odd, adopting mechanisms that I didn't even know about.

When I realized that, I tried to see it as a good thing: Wow! I didn't know it was possible to do that. How cool, it's helping me in learning new things! And it was true. I learned new things because of Copilot. And I could of course reject those suggestions and rewrite them as I wanted.

On the other hand, it's very easy to get carried away. Gradually I began to feel more and more comfortable in accepting the proposed suggestions. I analyzed the solution, but less judiciously. In some moments it even generated codes that I didn't fully understand. It wasn't necessarily complex but used some language tricks I didn't understand. In some cases, I did some testing (both automated and manual) and it worked. What should I do in these situations? I have a fully working code, but I don't quite understand how it works. I even accepted the solution a few times, saying to myself that I would come back later to study that part (but, of course, I never had the time and inclination to do so).

At the same time, I felt that the fact that I was able to write code more quickly was worth it.

Could this be the future? Writing code that we don't quite understand? Perhaps I should create more automated tests and trust that the code is correct as the tests are passing? But wouldn't I use this same technology to write the tests? Do I want to program this way? 🤔

Many questions started running through my head. Until the worst happened.

Oops, I put a bug in production and didn't even notice! 🐛

When I'm programming in Elm, it's not uncommon to start writing not very organized code, to test my ideas. Once I have the screen ready, displaying everything the way I want, I reorganize the code. This usually works great, in part because it's a statically typed language with an amazing compiler. I'm able to change the code and let the compiler guide me, and usually when the code compiles, my application still behaves as expected.

I'm a big fan of automated tests, but for this part of the application, there were no complex rules. All I had to do was make a REST call to the backend and display some formatted information on the screen. So my test coverage was low.

And it was during one of these modifications that something strange happened.

The problem I was solving

I won't go into too many business details, but I was working on a new functionality for an internal support web page (similar to a dashboard), where it's possible to consult various reports about a given system. This system provides a series of services to other third-party systems, and I needed to expose the usage metrics of these services through a page, with tables, graphs, etc.

Internally, there is an identification table for these third-party systems, containing (among other things) an identifier (an integer) and a description (a string).

To do this, I built a new endpoint in the backend that returned a json containing 2 lists: one of them was a simple key-value, containing the third-party system code and its respective name/description. The other list contained some metrics about the accesses made by these systems, grouped by month.

In a simplified form, the json returned was something like this:



{
    "hits": [
        {
            "monthYear": 1633057200000, // date in POSIX format
            "consumerSystemCode": 10,
            "quantity": 61
        },
        {
            "monthYear": 1635735600000,
            "consumerSystemCode": 10,
            "quantity": 220
        },
        {
            "monthYear": 1635735600000,
            "consumerSystemCode": 11,
            "quantity": 34032
        },
        {
            "monthYear": 1638327600000,
            "consumerSystemCode": 11,
            "quantity": 179301
        }
    ],

    "consumerSystemCodes": [
        {
            "id": 10,
            "decription": "XPTO Systen"
        },
        {
            "id": 11,
            "decription": "Foo System"
        },
        {
            "id": 12,
            "decription": "Another System"
        } 
    ]
}

Note that in this example the first list contains information about systems with identifiers 10 and 11, while the list of consumerSystemCodes contains identifiers with numbers 10, 11, and 12. This is important to understand what went wrong in my code.

How I solved the problem (without Copilot's help)

I wasn't using Copilot yet while building the first iteration of this screen. It contained a general report, with a summary gathering all systems, followed by a more detailed report, separated by sessions (one for each system).

To implement this, it is necessary to start from one of two lists: hits or consumerSystemCodes. A more distracted person might start from consumerSystemCodes, since it seems to be easier, after all, one could just make a map on top of it and generate a session on the screen for each system. Or, even better, could generate some intermediate data structure first, so it can be sorted and transformed however necessary before moving on to the view layer.

After reflecting a bit, I chose to start with the hits list, for two main reasons:

1- There are no restrictions that make it impossible for the hits list to have an entry with an id that is not present in the consumerSystemCodes list. And I wanted to be sure that all entries would be displayed on the screen and, in case this unexpected situation happens, that a new session was created on the screen, with the title "Unknown System (ID: x)" (one session for each unknown identifier).

2- And the opposite can also occur: the list consumerSystemCodes may contain systems that do not have any data in the list hits, so it would be displaying sessions with empty information.

There are several ways to get around these problems. The way I chose to do it was to start processing with the hits list.

So I started with a code that, I confess, was more complex than I initially imagined. Had to create some helper functions to get the needed data, but in the end, it worked fine.

Rearranging with the help of Copilot

I was eager to see the results on the screen and the code reflected that: it was messy and disorganized. But luckily it had been written in Elm! And the compiler was there, ready to help me with the reorganization. And this time I was already using Copilot.

When I started to change the design of the code, Copilot tried to implement it in the simplest way possible, through the list consumerSystemCodes. My mind was no longer focused on the potential problems of this path and... I let the Copilot guide me. Suddenly I was smiling, deleting various auxiliary functions that were "no longer needed". And, unfortunately, there were no automated tests to help me with this part of the redesign. Then I opened the system, pointing to my development environment, and... all right! All reports continued to appear as expected.

I was so excited! I remember texting a colleague to celebrate how Copilot had helped me drastically reduce the complexity of my code! 🥳

Time to push and let the pipeline publish to production.

As soon as I opened the page in production, I realized that something was wrong. 🪲🐞🪳 Multiple sessions with empty information! In development, this did not happen, since the database content was different. Lucky for me, although I was in the production environment, I hadn't made the link available in the menu yet, so no one else had access to this page and I had time to better analyze what was happening. My first reaction was to create a filter to remove empty entries. But that didn't happen before! What has changed? That's when I realized the source of the problem. And it wasn't just filtering the empty entries, since doing just that wouldn't solve problem 1 that I mentioned above.

Once again I needed to change the code, this time returning to a logic closer to the original implementation. And once again, Copilot insisted on starting the process through the consumerSystemCodes list. All autocompletes took me the wrong way and suddenly, I was almost re-implementing the logic in the wrong way again! Even knowing what needed to be done, it was hard not to let Copilot lead me down the wrong path. 🤦

After some fight, I won! But this battle left a bitter taste in my mouth. And spent hours thinking: would I have included this bug if not for Copilot? Probably not. Could I have avoided this through more automated testing? Probably yes. What if I was doing mob or pair programming? Maybe someone would spot the problem. And would an asynchronous code review catch this situation? I think it's unlikely, but maybe.

But this was just one of the scenarios where something like this happened. There weren't many and this was the most annoying, but they were enough to start causing me some discomfort.

More weird code

At some point, I was reviewing the code and noticed something VERY strange: a function was returning an anonymous function, without any reason why. I don't know exactly when that happened, but my hypothesis is that I had created a new function and forgotten to include one of the parameters to do what I needed. Any developer would notice this right away and change the function signature, but Copilot is not able to change the code, it only suggests new code. To solve the problem, I think it created an anonymous function with the missing parameter! The rest of the code was as expected and did the right thing, which is perhaps the reason I accepted the suggestion without realizing the existence of the anonymous function. 😳

I wondered what it would be like if someone else had encountered this problem in my code! What would they think? That made no sense! Even more so in Elm, where every function has currying by default! If it weren't my code, maybe I'd even be afraid to change it, after all, no human being would write something like that without having a good reason. I could spend hours looking at that lines without realizing why it was returning an anonymous function.

Trust or distrust autocomplete?

After a while, I realized that even though I was quite skeptical at first and understood the limitations of an AI-assisted tool, it got it right enough times to slowly gain my trust. And that needed to change.

Ever since I started working professionally as a software developer, I've been using some kind of code autocomplete tool. In general, simple things like the name of a function or method. One of the advantages of autocompletes in IDEs and code editors is that I can trust them. If my IDE is telling me that this object has a certain method or if a certain module has a certain function, I've learned to believe this information. When I don't remember if the function is called getTaxpayerStatus or retrieveTaxpayerStatus and the first option appears in my editor, I know that this is the correct name and trust it.

But from the moment that my autocomplete starts to hallucinate and invent names for functions or methods, I need to be more cautious. Of course, my IDE (or code editor) autocomplete and Copilot autocomplete are visually quite different, but I need to "turn off" a switch in my brain, which is used to trusting all autocomplete. Now I always have to ask myself: is this based on facts or was this generated by a plugin that is just making things up based on probability and statistics?

How to proceed?

Copilot positively surprised me in many ways. I studied AI during my master's degree and didn't expect that it would be possible to have such an advanced tool so soon.

And in the first days it was just joy, but after a month and some non-trivial to detect bugs, using this tool started to generate some stress and anxiety. And something that should have lessened my cognitive load has started to increase it.

I started to distrust everything that was generated and that pleasant feeling of the first days become less common.

This doesn't mean that I think GitHub Copilot is bad or that I don't recommend it. On the contrary. It was a really cool experience! And I recommend that every developer try it out and draw their conclusions. I think it's likely that we're moving towards a time when this type of technology will become part of (almost) every developer's day-to-day. Whether this will be good or bad, only time will tell.

But still, I chose not to use it anymore. At least for a while.

Although I didn't do any measurements, turning this tool off seems to have slowed me down. Writing a filter, simple parsers, and boilerplates, in general, have become more onerous processes. But this is not necessarily a bad thing. Taking time to reflect on what I'm doing has taken a weight off my shoulders.

Copilot's slogan is: "Your AI pair programmer". And yes, it felt like there was a person programming with me all the time - but an annoying one, who wouldn't shut up for a second. 📣 It was like doing pair programming, but with someone who has no idea what it means to work this way. Who understands absolutely nothing about the problem I want to solve. And when something goes wrong, blames me.

In the first hours without Copilot I really missed it! I was already used to the idea of typing the name of a function and seeing the code being generated. But soon readjusted to the way I programmed before. And felt more in control. More relaxed.

I still don't know if (or when) I'll ever use a technology like this again, but for now, I'm fine without it. When I need something very specific, I talk to ChatGPT and that has been enough for me.

And what about you? Have you tried Copilot yet? What are your thoughts? Leave your comments below!

Did you like this text? Check out my other articles at: https://segunda.tech/tags/english and follow me on Twitter and Blue Sky.

Top comments (13)

KinsonDigital • Jun 21 '23

I have had the same experience with copilot. I was sceptical to fully trust it as well and after using it, I still am. I think it is VERY dangerous for any developer to do so. We HAVE to know and understand what is going on in our software. Today, it is simply for helping me write simple boilerplate to reduce my typing. But even then, I am always rereading and making sure that it is correct.

When it comes to writing business logic and algorithms, I write it myself.

Marcio Frayze • Jun 21 '23 • Edited

Thanks for your comment.

Your approach is probably the best of both worlds. And I may come back to it someday.

I have to say I fear what is ahead of us. It's an amazing tool, but I miss people talking more about the pitfalls and how to prevent them.

KinsonDigital • Jun 21 '23

I fear for it as well. IMHO, the problems that people keep talking about when it comes to AI and its taking over our jobs, the world, etc, is not a doomsday scenario and is not due to AI. AI cannot think for itself. All problems that stem from AI will be human in origin, not an AI problem itself.

With all things invented since the beginning of human civilization, they will be abused by humans. Everything we have are tools which we are responsible for. Whether it is weapons, software, cars, or financial tools that sway the markets in a negative manner that hurt people, they are all tools used by humans that intentionally or unintentionally cause harm.

Like I always tell everybody, humans are ultimately responsible. IMO, this is why AI is a human problem.

--End Rant--

😁

vladi160 • Jun 20 '23

I have the same experience, ChatGPT (OpenAI version) is much better, but I found Copilot is good for design things. When you need to design something and you know how, but you don't know how, it generates (sometimes) good styles

Samuel Viana • Jun 19 '23

I'm using copilot for almost one year. Yes, I usually accept CP's suggestions, and allways try to understand the code suggested by CP and test it. Instead of being the artisan, you're turning into a curator, trying to minimize CP's wrong decisions. And after all, because in the end the code is yours, not CP's.

Tim Donselaar • Jun 21 '23

I use copilot all the time, It really needs be used as a "copilot". And not as a sliver bullet.

I program as normal and if a suggestion fits what I'm writing I use it. If it 80-90% I use it and change it to my needs.
I always verify the code and know exactly what it does.

I would not suggest it to developers that aren't proficient in the programming language that they are using.

A tip, copilot scans all open tabs. So having the tabs of your underling classes/functions open, gives copilot more context and ends up with much better results.

Arjun Sharma • Jun 28 '23

I had been using Copilot for quite some time before I stopped using it because I found myself not understanding my own code which tells a lot. I strongly believe that no developer should use Copilot for the reason that you have stated so clearly.
Nobody wants to make shitty applications on purpose and be there to see it's corpse rotting for eternity

John Papa • Jun 19 '23

Marcio - Thank you for sharing your experience with GitHub Copilot.

Carlos Saltos • Jun 16 '23

Cool article, thanks for sharing.

For me, personally the speed of typing code is not an issue, as a human, I can type slow compared with a bot, but quality is highly required, not speed 😅

Kevin • Jun 18

I really liked your article! To add some additional alternatives to Github Copilot, Some open-source projects started using the remaining open-source training model, but most are stalled. Unless they were to receive funding, it would be pretty hard to achieve the same level of efficiency GitHub Copilot offers now. The latest version of its model is now available for licensing, preferably by Microsoft. As other companies can license OpenAI Codex, it’s possible we will have more alternatives in the future.

Tabnine aims to provide a similar product and already has a solid customer base (they claim to have more than 1 million developers using their product). They stand for using only open-source code with permissive licenses for their Open Source Trained AI model (MIT, Apache 2.0, BSD-2-Clause, BSD-3-Clause) from GitHub repositories or customer’s repositories/local code. You can also download the model to your machine and get even faster responses. Their Pro subscription comes in at $12 US per month.

In June 2022, Amazon launched a preview of CodeWhisperer, another tool similar to GitHub Copilot. This seems to be a legitimate competitor, but there’s not enough information to compare them so far.

It’s likely that we’ll have more competitors to evaluate in the future. For now, we’ll just say it’s “to be continued…”

I would like to share this article from my colleague Rafael Goulart, which presents a very interesting and interactive testing of GitHub Copilot in a small full-stack project using PHP and JavaScript: scalablepath.com/full-stack/ai-pai... He shared some real-life examples and some other pros and cons that may be useful if you are interested in these kind of tools.