DEV Community

Max Patiiuk
Max Patiiuk

Posted on

Shortcomings of GitHub Copilot

TLDR ✨: I didn’t have much luck with GitHub copilot on larger code snippets - I don’t like the code quality. It was trained on whole bunch of average code, and that's what it produces. It's like having a smart intern, rather than a senior mentor. It’s awesome for 1-3 line auto completion though.

With all the buzz around GitHub Copilot and how much more productive it makes you I was quick to try it out. After using it for a year, I have a few observations to share.

One of the most exciting use cases was automating the test-writing. Tests are super important, yet I won't be the first to say that it's not the most thrilling part of software development.

With just a small prompt, GitHub Copilot can generate the entire test case:

An entire test case generated by GitHub Copilot when given just a test name

And another one:

GitHub Copilot writing a test for a failing request

First, I was immensely excited about what this will do to my productivity ... until I looked closer at the code.

The problems with above code

  1. It looks like the solution uses React Testing Library. React Testing Library discourages usage of test ids, in favor of finding elements by their semantic meaning or label - that makes tests less brutal, and tests accessibility at the same time. In my codebase, I have no test ids, almost zero ids, and almost zero class names (excluding tailwind).
  2. fireEvent is no longer a recommended way to mock events in Tests. See @testing-library/user-event.
  3. IPLookupResultContainers variable is assigned to elements on the page (querySelectorAll). Then why does the test calls .toBeInTheDocument() for them afterward?
  4. Some people don't like snapshots, but I think it might work nicely in this case - just take the snapshot of the page after the results are loaded - snapshots can also catch refactoring mistakes (i.e, you changed a valid class name to a typo accidentally).
  5. Prefer StrictPascalCase and strictCamelCase over PascalCase or camelCase for variable naming. That means no consecutive capitalized letters - ever. That prevents abominations like XMLHttpRequest, where two acronyms are inconsistently capitalized. Not capitalizing acronyms improves readability (HTTPServerError vs HttpServerError) and prevents inconsistency in capitalization.
  6. This is a matter of preference, but I would have split this test into one that tests the fetching logic and mocks the network request, and another that receives mocked data and just outputs it to the screen.
  7. While it's a common convention to use "it" instead of "test" inside of "describe", I find it quite silly. Why have inconsistency? Why do I have to refactor all my "test" calls into "it" if I decide to wrap tests in "describe" and refactor them back when I decide to put them outside of "describe"?

For most of these things, I have ESLint rules so it catches these mistakes in my code and that of my teammates. Even better when ESLint can do an auto fix.

The code snippet encapsulated what I have been seeing in GitHub Copilot so far:

  1. It has a big bias for what's most common, even if it's no longer recommended (I have seen it use "let" or even "var" instead of "const").
  2. It's slow to catch up to newer conventions because it's trained on a ton of code and so the law of averages comes into play.
  3. It also doesn't always handle the edge cases just because the code Copilot is copying from does not cover the edge cases.
  4. I also saw GitHub copilot trying to style a <div> like a button 😢.

Still, the fact that it can produce semi-working code at all is very impressive in the grand scheme of things.

I don't know how they would solve this problem though. By definition, there is a ton more average code and junior developers than great code and senior devs. Similarly, there would be way less code for newer shiny things than for established older practices. Thus, it would be hard for GitHub to 1. discriminate between average and great code, 2. find enough great code to train on.

And then add to it the fact that a lot of code quality things are subjective and differ between languages. Heck, I was shocked that some of React best practices are considered code smell in Vue and vice versa (see my article on this).

When I did an internship at Amazon, I was shocked when I was told that they don't follow best practices and instead do what seems best for the problem at hand - kind of makes sense in retrospect because we are not in a perfect world and there is not enough time to write perfect code, and even if you did that, the environment changes all the time so the code would get out of date, and best practices have to be generic and may not apply to your project.

Top comments (0)